Automatic Imputation — auto_impute • glyclean

This function automatically selects and applies the most suitable imputation method for the given dataset. If Quality Control (QC) samples are present, the method that best stabilizes them (i.e., yields the lowest median coefficient of variation) is chosen. Otherwise, it defaults to a sample-size-based strategy:

less than 30 samples: Sample minimum imputation
between 30 and 100 samples: Minimum probability imputation
more than 100 samples: MissForest imputation

Usage

auto_impute(
  exp,
  group_col = "group",
  qc_name = "QC",
  to_try = NULL,
  info = NULL
)

Arguments

exp

An glyexp::experiment().

group_col

The column name in sample_info for groups. Default is "group". Can be NULL when no group information is available.

qc_name

The name of QC samples in the group_col column. Default is "QC". Only used when group_col is not NULL. Can be NULL when no QC samples are available.

to_try

Imputation functions to try. A list. Default includes:

impute_zero(): zero imputation
impute_sample_min(): sample minimum imputation
impute_half_sample_min(): half sample minimum imputation
impute_sw_knn(): sample-wise KNN imputation
impute_fw_knn(): feature-wise KNN imputation
impute_bpca(): BPCA imputation
impute_ppca(): PPCA imputation
impute_svd(): SVD imputation
impute_min_prob(): minimum probability imputation
impute_miss_forest(): MissForest imputation

info

Internal parameter used by auto_clean().

Value

The imputed experiment.

Details

By default, all imputation methods are included for benchmarking when QC samples are available. Note that some methods (e.g., MissForest) may be slow for large datasets.

Examples

library(glyexp)
exp_imputed <- auto_impute(real_experiment)
#> ℹ No QC samples found. Using default imputation method based on sample size.
#> ℹ Sample size <= 30, using `impute_sample_min()`.