
Calculate Derived Traits
derive_traits.Rd
This function calculates derived traits from a glyexp::experiment()
object.
For glycomics data, it calculates the derived traits directly.
For glycoproteomics data, each glycosite is treated as a separate glycome,
and derived traits are calculated in a site-specific manner.
Arguments
- exp
A
glyexp::experiment()
object. Before using this function, you should preprocess the data using theglyclean
package. For glycoproteomics data, the data should be aggregated to the "gfs" (glycoforms with structures) level usingglyclean::aggregate()
. Also, please make sure that theglycan_structure
column is present in thevar_info
table, as not all glycoproteomics identification softwares provide this information. "glycan_structure" can be aglyrepr::glycan_structure()
vector, or a character vector of glycan structure strings supported byglyparse::auto_parse()
.- trait_fns
A named list of derived trait functions created by trait factories. Names of the list are the names of the derived traits. Default is
NULL
, which means all derived traits inbasic_traits()
are calculated.- mp_fns
A named list of meta-property functions. This parameter is useful if your trait functions use custom meta-properties other than those in
all_mp_fns()
. Default isNULL
, which means all meta-properties inall_mp_fns()
are used.- mp_cols
A character vector of column names in the
var_info
tibble to use as meta-properties. If names are provided, they will be used as names of the meta-properties, otherwise the column names will be used. Meta-properties specified inmp_cols
will overwrite those introduced bymp_fns
with the same names, including the built-in meta-properties. Default isNULL
, which means no columns are used as meta-properties.
Value
A new glyexp::experiment()
object for derived traits.
Instead of "quantification of each glycan on each glycosite in each sample",
the new experiment()
contains "the value of each derived trait on each glycosite in each sample",
with the following columns in the var_info
table:
variable
: variable IDtrait
: derived trait name
For glycoproteomics data, with additional columns:
protein
: protein IDprotein_site
: the glycosite position on the protein
Other columns in the var_info
table (e.g. gene
) are retained if they have "many-to-one"
relationship with glycosites (unique combinations of protein
, protein_site
).
That is, each glycosite cannot have multiple values for these columns.
gene
is a common example, as a glycosite can only be associate with one gene.
Descriptions about glycans are not such a column, as a glycosite can have multiple glycans,
thus having multiple descriptions.
Columns not having this relationship with glycosites will be dropped.
Don't worry if you cannot understand this logic,
as long as you know that this function will try its best to preserve useful information.
sample_info
and meta_data
are not modified,
except that the exp_type
field of meta_data
is set to "traitomics" for glycomics data,
and "traitproteomics" for glycoproteomics data.
Examples
library(glyexp)
library(glyclean)
#>
#> Attaching package: ‘glyclean’
#> The following object is masked from ‘package:stats’:
#>
#> aggregate
exp <- real_experiment |>
auto_clean() |>
slice_sample_var(n = 100)
#> ℹ Normalizing data (Median)
#> ✔ Normalizing data (Median) [184ms]
#>
#> ℹ Removing variables with >50% missing values
#> ✔ Removing variables with >50% missing values [19ms]
#>
#> ℹ Imputing missing values
#> ℹ Sample size <= 30, using sample minimum imputation
#> ℹ Imputing missing values
#> ✔ Imputing missing values [24ms]
#>
#> ℹ Aggregating data
#> ✔ Aggregating data [1.1s]
#>
#> ℹ Normalizing data again
#> ✔ Normalizing data again [16ms]
#>
trait_exp <- derive_traits(exp)
trait_exp
#>
#> ── Traitproteomics Experiment ──────────────────────────────────────────────────
#> ℹ Expression matrix: 12 samples, 896 variables
#> ℹ Sample information fields: group <chr>
#> ℹ Variable information fields: protein <chr>, protein_site <int>, trait <chr>, gene <chr>
# By default, only basic traits are calculated
names(basic_traits())
#> [1] "TM" "TH" "TC" "MM" "CA2" "CA3" "CA4" "TF" "TFc" "TFa" "TB" "SG"
#> [13] "GA" "TS"
# You can calculate all traits in `all_traits()`
more_trait_exp <- derive_traits(exp, trait_fns = all_traits())
more_trait_exp
#>
#> ── Traitproteomics Experiment ──────────────────────────────────────────────────
#> ℹ Expression matrix: 12 samples, 3968 variables
#> ℹ Sample information fields: group <chr>
#> ℹ Variable information fields: protein <chr>, protein_site <int>, trait <chr>, gene <chr>