Omics data cleaning and preprocessing is a critical yet cumbersome step. glyclean helps you perform these tasks with ease, so you can focus on the fun part: downstream analysis!
Installation
You can install the development version of glyclean from GitHub with:
# install.packages("pak")
pak::pak("glycoverse/glyclean")
Role in glycoverse
As data preprocessing is an essential step in omics data analysis, glyclean
plays a central role in the glycoverse
ecosystem. It serves as the bridge between raw experimental data (imported via glyread
) and downstream analysis, enabling other packages like glystats
to work with clean, analysis-ready data.
Example
library(glyexp)
library(glyclean)
#>
#> Attaching package: 'glyclean'
#> The following object is masked from 'package:stats':
#>
#> aggregate
exp <- toy_experiment()
exp <- set_exp_type(exp, "glycomics")
clean_exp <- auto_clean(exp)
#> ℹ Normalizing data (Median Quotient)
#> ✔ Normalizing data (Median Quotient) [32ms]
#>
#> ℹ Removing variables with >50% missing values
#> ✔ Removing variables with >50% missing values [47ms]
#>
#> ℹ Imputing missing values
#> ℹ Sample size <= 30, using sample minimum imputation
#> ℹ Imputing missing values✔ Imputing missing values [9ms]
#>
#> ℹ Normalizing data (Total Area)
#> ℹ Detecting batch effects using ANOVA for 4 variables...
#> ℹ Normalizing data (Total Area)✔ Batch effect detection completed. 0 out of 4 variables show significant batch effects (p < 0.05).
#> ℹ Normalizing data (Total Area)ℹ Batch effects detected in 0.0% of variables (<=10%). Skipping batch correction.
#> ℹ Normalizing data (Total Area)✔ Normalizing data (Total Area) [52ms]
Yes, that’s it! Calling the magical auto_clean()
function will automatically perform the following steps, in the most suitable way for your data:
- Normalization
- Missing value filtering
- Imputation
- Batch effect correction
and other steps that are necessary for downstream analysis.