
Remove Variables with Missing Values
remove_missing_variables.Rd
Remove Variables with Missing Values
Arguments
- x
Either a
glyexp_experiment
object or a matrix. If a matrix, rows should be variables and columns should be samples.- prop
The proportion of missing values to use as a threshold. Variables with missing values above this threshold will be removed. Defaults to 0.5.
- n
The number of missing values to use as a threshold. An alternative to
prop
.- by
Either a column name in
sample_info
(string) or a factor/vector specifying group assignments for each sample. Missing value counts or proportions will be calculated within each group.- strict
Works with
by
. IfFALSE
, remove a variable only if it passes the missing threshold in all groups. IfTRUE
, remove a variable if it passes the missing threshold in any group. See examples for more details.- min_n
The minimum number of non-missing values required for a variable to be kept. If
NULL
(default), it is calculated dynamically:For datasets with 1, 2, or 3 samples: min_n equals the sample count
For datasets with >3 samples: min_n = 3
When using
by
, the rule is applied within each group
Value
For glyexp_experiment
input, returns a modified glyexp_experiment
object.
For matrix input, returns a filtered matrix.
Examples
# With glyexp_experiment
exp <- glyexp::toy_experiment()
exp$expr_mat[1, 1] <- NA # V1: 1/6 missing
exp$expr_mat[2, 1:3] <- NA # V2: 3/6 missing
exp$expr_mat[3, 1:5] <- NA # V3: 5/6 missing
exp$expr_mat[4, 1:6] <- NA # V4: 6/6 missing
exp$expr_mat
#> S1 S2 S3 S4 S5 S6
#> V1 NA 5 9 13 17 21
#> V2 NA NA NA 14 18 22
#> V3 NA NA NA NA NA 23
#> V4 NA NA NA NA NA NA
# Remove variables with more than 50% missing values.
remove_missing_variables(exp, prop = 0.5)$expr_mat
#> S1 S2 S3 S4 S5 S6
#> V1 NA 5 9 13 17 21
#> V2 NA NA NA 14 18 22
# Remove variables with more than 2 missing values.
remove_missing_variables(exp, n = 2)$expr_mat
#> S1 S2 S3 S4 S5 S6
#> V1 NA 5 9 13 17 21
# Remove variables if they have more than 1 missing value in all groups.
# In another word, keep variables as long as they have 1 or 0 missing value
# in any group.
remove_missing_variables(exp, by = "group", strict = FALSE)$expr_mat
#> S1 S2 S3 S4 S5 S6
#> V1 NA 5 9 13 17 21
#> V2 NA NA NA 14 18 22
# Keep only variables with no missing values.
remove_missing_variables(exp, prop = 0)$expr_mat
#> S1 S2 S3 S4 S5 S6
# Use custom min_n to require at least 4 non-missing values
remove_missing_variables(exp, min_n = 4)$expr_mat
#> S1 S2 S3 S4 S5 S6
#> V1 NA 5 9 13 17 21
# With matrix
mat <- matrix(c(1, 2, NA, 4, 5, NA, 7, 8, 9), nrow = 3)
mat_filtered <- remove_missing_variables(mat, prop = 0.5)