Omics data cleaning and preprocessing is a critical yet cumbersome step. glyclean helps you perform these tasks with ease, so you can focus on the fun part: downstream analysis!
Installation
You can install the latest release of glyclean from GitHub with:
# install.packages("remotes")
remotes::install_github("glycoverse/glyclean@*release")Or install the development version:
remotes::install_github("glycoverse/glyclean")Role in glycoverse
As data preprocessing is an essential step in omics data analysis, glyclean plays a central role in the glycoverse ecosystem. It serves as the bridge between raw experimental data (imported via glyread) and downstream analysis, enabling other packages like glystats to work with clean, analysis-ready data.
Example
library(glyexp)
library(glyclean)
#>
#> Attaching package: 'glyclean'
#> The following object is masked from 'package:stats':
#>
#> aggregate
exp <- real_experiment
clean_exp <- auto_clean(exp)
#>
#> ββ Normalizing data ββ
#>
#> No QC samples found. Using default normalization method based on experiment
#> type.
#> Experiment type is "glycoproteomics". Using `normalize_median()`.
#>
#>
#> ββ Removing variables with too many missing values ββ
#>
#>
#>
#> No QC samples found. Using all samples.
#> Applying preset "discovery"...
#> Total removed: 24 (0.56%) variables.
#>
#>
#> ββ Imputing missing values ββ
#>
#>
#>
#> No QC samples found. Using default imputation method based on sample size.
#> Sample size <= 30, using `impute_sample_min()`.
#>
#>
#> ββ Aggregating data ββ
#>
#>
#>
#> Aggregating to "gfs" level
#>
#>
#> ββ Normalizing data again ββ
#>
#>
#>
#> No QC samples found. Using default normalization method based on experiment
#> type.
#> Experiment type is "glycoproteomics". Using `normalize_median()`.
#>
#>
#> ββ Correcting batch effects ββ
#>
#>
#>
#> βΉ Batch column not found in sample_info. Skipping batch correction.Yes, thatβs it! Calling the magical auto_clean() function will automatically perform the following steps, in the most suitable way for your data:
- Normalization
- Missing value filtering
- Imputation
- Batch effect correction
and other steps that are necessary for downstream analysis.
