Skip to contents

This function calculates derived traits from a glyexp::experiment() object. For glycomics data, it calculates the derived traits directly. For glycoproteomics data, each glycosite is treated as a separate glycome, and derived traits are calculated in a site-specific manner.

Usage

derive_traits(exp, trait_fns = NULL, mp_fns = NULL, mp_cols = NULL)

Arguments

exp

A glyexp::experiment() object. Before using this function, you should preprocess the data using the glyclean package. For glycoproteomics data, the data should be aggregated to the "gfs" (glycoforms with structures) level using glyclean::aggregate(). Also, please make sure that the glycan_structure column is present in the var_info table, as not all glycoproteomics identification softwares provide this information. "glycan_structure" can be a glyrepr::glycan_structure() vector, or a character vector of glycan structure strings supported by glyparse::auto_parse().

trait_fns

A named list of derived trait functions created by trait factories. Names of the list are the names of the derived traits. Default is NULL, which means all derived traits in basic_traits() are calculated.

mp_fns

A named list of meta-property functions. This parameter is useful if your trait functions use custom meta-properties other than those in all_mp_fns(). Default is NULL, which means all meta-properties in all_mp_fns() are used.

mp_cols

A character vector of column names in the var_info tibble to use as meta-properties. If names are provided, they will be used as names of the meta-properties, otherwise the column names will be used. Meta-properties specified in mp_cols will overwrite those introduced by mp_fns with the same names, including the built-in meta-properties. Default is NULL, which means no columns are used as meta-properties.

Value

A new glyexp::experiment() object for derived traits. Instead of "quantification of each glycan on each glycosite in each sample", the new experiment() contains "the value of each derived trait on each glycosite in each sample", with the following columns in the var_info table:

  • variable: variable ID

  • trait: derived trait name

For glycoproteomics data, with additional columns:

  • protein: protein ID

  • protein_site: the glycosite position on the protein

Other columns in the var_info table (e.g. gene) are retained if they have "many-to-one" relationship with glycosites (unique combinations of protein, protein_site). That is, each glycosite cannot have multiple values for these columns. gene is a common example, as a glycosite can only be associate with one gene. Descriptions about glycans are not such a column, as a glycosite can have multiple glycans, thus having multiple descriptions. Columns not having this relationship with glycosites will be dropped. Don't worry if you cannot understand this logic, as long as you know that this function will try its best to preserve useful information.

sample_info and meta_data are not modified, except that the exp_type field of meta_data is set to "traitomics" for glycomics data, and "traitproteomics" for glycoproteomics data.

Examples

library(glyexp)
library(glyclean)
#> 
#> Attaching package: ‘glyclean’
#> The following object is masked from ‘package:stats’:
#> 
#>     aggregate

exp <- real_experiment |>
  auto_clean() |>
  slice_sample_var(n = 100)
#>  Normalizing data (Median)
#>  Normalizing data (Median) [184ms]
#> 
#>  Removing variables with >50% missing values
#>  Removing variables with >50% missing values [19ms]
#> 
#>  Imputing missing values
#>  Sample size <= 30, using sample minimum imputation
#>  Imputing missing values

#>  Imputing missing values [24ms]
#> 
#>  Aggregating data
#>  Aggregating data [1.1s]
#> 
#>  Normalizing data again
#>  Normalizing data again [16ms]
#> 
trait_exp <- derive_traits(exp)
trait_exp
#> 
#> ── Traitproteomics Experiment ──────────────────────────────────────────────────
#>  Expression matrix: 12 samples, 896 variables
#>  Sample information fields: group <chr>
#>  Variable information fields: protein <chr>, protein_site <int>, trait <chr>, gene <chr>

# By default, only basic traits are calculated
names(basic_traits())
#>  [1] "TM"  "TH"  "TC"  "MM"  "CA2" "CA3" "CA4" "TF"  "TFc" "TFa" "TB"  "SG" 
#> [13] "GA"  "TS" 

# You can calculate all traits in `all_traits()`
more_trait_exp <- derive_traits(exp, trait_fns = all_traits())
more_trait_exp
#> 
#> ── Traitproteomics Experiment ──────────────────────────────────────────────────
#>  Expression matrix: 12 samples, 3968 variables
#>  Sample information fields: group <chr>
#>  Variable information fields: protein <chr>, protein_site <int>, trait <chr>, gene <chr>