
Quantify Motifs in an Experiment
quantify_motifs.RdThis function quantifies motifs from glycomic or glycoproteomic profiles. For glycomics data, it calculates motif quantifications directly. For glycoproteomics data, each glycosite is treated as a separate glycome, and motif quantifications are calculated in a site-specific manner.
The function takes a glyexp::experiment() object and returns a new glyexp::experiment()
object with motif quantifications. Instead of containing quantifications of individual glycans
on each glycosite in each sample, the new experiment contains quantifications
of each motif on each glycosite in each sample (for glycoproteomics data) or
motif quantifications in each sample (for glycomics data).
Usage
quantify_motifs(
exp,
motifs,
method = "relative",
alignments = NULL,
ignore_linkages = FALSE
)Arguments
- exp
A
glyexp::experiment()object. Before using this function, you should preprocess the data using theglycleanpackage. For glycoproteomics data, the data should be aggregated to the "gfs" (glycoforms with structures) level usingglyclean::aggregate(). Also, please make sure that theglycan_structurecolumn is present in thevar_infotable, as not all glycoproteomics identification softwares provide this information. "glycan_structure" can be aglyrepr::glycan_structure()vector, or a character vector of glycan structure strings supported byglyparse::auto_parse().- motifs
A character vector of motif names, glycan structure strings, or a 'glyrepr_structure' object. For glycan structure strings, all formats supported by
glyparse::auto_parse()are accepted, including IUPAC-condensed, WURCS, GlycoCT, and others. If the vector is named, the names will be used as motif names. Otherwise, IUPAC-condensed structure strings will be used as motif names.- method
A character string specifying the quantification method. Must be either "absolute" or "relative". Default is "relative". See "Relative and Absolute Motif Quantification" section for details.
- alignments
A character vector specifying the alignment method for each motif. Can be "terminal", "substructure", "core", or "whole". Default is "substructure". See
glymotif::have_motifs()for details.- ignore_linkages
A logical value. If
TRUE, linkages will be ignored in the comparison. Seeglymotif::have_motifs()for details.
Value
A new glyexp::experiment() object containing motif quantifications.
Instead of containing quantifications of individual glycans on each glycosite in each sample,
the new experiment contains quantifications of each motif on each glycosite in each sample
(for glycoproteomics data) or motif quantifications in each sample (for glycomics data).
For glycoproteomics data, with additional columns:
protein: protein IDprotein_site: the glycosite position on the protein
Other columns in the var_info table (e.g., gene) are retained if they have a "many-to-one"
relationship with glycosites (unique combinations of protein, protein_site).
That is, each glycosite cannot have multiple values for these columns.
gene is a common example, as a glycosite can only be associated with one gene.
Glycan descriptions are not such columns, as a glycosite can have multiple glycans,
thus having multiple descriptions.
Columns that do not have this relationship with glycosites will be dropped.
Don't worry if you cannot understand this logic—
just know that this function will do its best to preserve useful information.
The sample_info and meta_data tables are not modified,
except that the exp_type field in meta_data is set to "traitomics" for glycomics data
and "traitproteomics" for glycoproteomics data.
Relative and Absolute Motif Quantification
Motif quantification can be performed in two ways: absolute and relative. Do not confuse this with the absolute and relative quantification of glycans or glycopeptides.
In an omics context, absolute quantification means we can determine the actual concentration (e.g., in mg/L) or counts (e.g., mRNA copy numbers), while relative quantification means we can only compare the abundance of the same molecule between different samples, but the absolute values themselves are not meaningful. Label-free quantification is a relative quantification method.
In the context of motif quantification, absolute and relative refer to whether we normalize the motif quantifications by the total abundance of the glycome (or the mini-glycomes of individual glycosites).
For example, let's say we have a glycomics dataset with two samples, A and B. In each sample, three glycans (G1, G2, G3) are present, with the following abundances:
Sample A: G1 = 10, G2 = 20, G3 = 30
Sample B: G1 = 20, G2 = 40, G3 = 60
For simplicity, let's say the motif we want to quantify happens to appear once in all glycans.
For absolute motif quantification, we simply sum the abundances of the motif across all glycans:
Sample A: 10 × 1 + 20 × 1 + 30 × 1 = 60 (× 1 because the motif appears once in each glycan)
Sample B: 20 × 1 + 40 × 1 + 60 × 1 = 120
The results are clearly different.
However, if we quantify the motif in a relative way:
Sample A: (10 × 1 + 20 × 1 + 30 × 1) / (10 + 20 + 30) = 60 / 60 = 1
Sample B: (20 × 1 + 40 × 1 + 60 × 1) / (20 + 40 + 60) = 120 / 120 = 1
The results are identical!
The absolute motif quantification answers the question "how many motifs are there in the sample?", while the relative motif quantification answers the question "if I take out one glycan molecule, how many motifs are on it in average?"
So which method should you use? It depends on both your data type and research objectives. Here are some general guidelines:
For glycomics data, you should typically use relative motif quantification, because glycomics data is inherently compositional (search "compositional data" for more details).
For glycoproteomics data, the choice depends on your research question. If you want to compare observed motif abundance across samples, use absolute motif quantification. Note that many factors can cause one sample to have higher values than another, such as upregulation of enzymes responsible for the motif, or higher overall glycosylation site occupancy. If you want to understand the underlying regulatory mechanisms, use relative motif quantification to correct for differences in site occupancy.
Relationship with Derived Traits
Motif quantification is a special type of derived trait.
It is simply a weighted sum (wsum()) or weighted mean (wmean()) of the motif counts,
for absolute and relative motif quantification, respectively.
You can perform motif quantification manually using derive_traits().
Let's perform absolute motif quantification manually using derive_traits()
to better understand the process (we will use custom column meta-properties here):
# Add the meta-properties to the variable information tibble
motifs <- c(
nLx = "Hex(??-?)[dHex(??-?)]HexNAc(??-", # Lewis x antigen
nSLx = "NeuAc(??-?)Hex(??-?)[dHex(??-?)]HexNAc(??-" # Sialyl Lewis x antigen
)
exp_with_mps <- glymotif::add_motifs_int(exp, motifs)
# Define the traits
trait_fns <- list(Lx = wsum(nLx), SLx = wsum(nSLx))
# Calculate the traits
derive_traits(exp_with_mps, trait_fns = trait_fns, mp_cols = c("nLx", "nSLx"))The code snippet above is equivalent to:
# Quantify the motifs
quantify_motifs(exp, motifs, method = "absolute")In fact, this is essentially the implementation of the quantify_motifs() function,
except the actual implementation is more robust and user-friendly.
For relative motif quantification, simply replace wsum() with wmean(),
and everything else remains the same.
Examples
library(glyexp)
library(glyclean)
exp <- real_experiment |>
auto_clean() |>
slice_head_var(n = 10)
#> ℹ Normalizing data (Median)
#> ✔ Normalizing data (Median) [7ms]
#>
#> ℹ Removing variables with >50% missing values
#> ✔ Removing variables with >50% missing values [15ms]
#>
#> ℹ Imputing missing values
#> ℹ Sample size <= 30, using sample minimum imputation
#> ℹ Imputing missing values
#> ✔ Imputing missing values [15ms]
#>
#> ℹ Aggregating data
#> ✔ Aggregating data [1.6s]
#>
#> ℹ Normalizing data again
#> ✔ Normalizing data again [14ms]
#>
motifs <- c(
nLx = "Hex(??-?)[dHex(??-?)]HexNAc(??-", # Lewis x antigen
nSLx = "NeuAc(??-?)Hex(??-?)[dHex(??-?)]HexNAc(??-" # Sialyl Lewis x antigen
)
quantify_motifs(exp, motifs)
#>
#> ── Traitproteomics Experiment ──────────────────────────────────────────────────
#> ℹ Expression matrix: 12 samples, 8 variables
#> ℹ Sample information fields: group <fct>
#> ℹ Variable information fields: protein <chr>, protein_site <int>, motif <chr>, gene <chr>