
Get Started with glydet
glydet.RmdGlycan derived traits are summary features calculated from individual glycan abundances. Instead of analyzing every single glycan structure, we combine related glycans into biologically meaningful groups. For example, traits describing the overall level of galactosylation, sialylation, fucosylation, or branching. These derived traits capture broader patterns in glycosylation and reduce noise from individual measurements.
Compared with analyzing raw glycan abundances (the “direct traits”), using derived traits has several advantages. It simplifies the data while keeping key biological information, making it easier to interpret and compare across samples. Derived traits also tend to be more robust and less affected by technical variation, and they can better highlight biological trends or associations with phenotypes. In short, derived traits help us see the forest rather than just the trees 🌲🌲🌲.
Enter glydet 🚀—your toolkit for calculating derived
traits with unprecedented precision. But here’s where it gets exciting:
glydet brings the power of derived traits to the
glycoproteomics community for the first time, enabling
site-specific trait analysis. Plus, it features an
intuitive domain-specific language that lets you define custom traits
tailored to your research needs.
Important Notes Before You Start
Prerequisites
This package is built on the glyrepr package, and heavily relies on the glyexp package. If you are not familiar with these two packages, we highly recommend checking out their introductions first. Also, to fully understand the concepts and functions in this package, it is recommended to have a basic understanding of the glymotif package.
Data Types
glydet is designed to work with untargeted glycomics and
glycoproteomics data. Label-free quantification data is readily
supported by glyread and glyclean. For
labeling quantification like TMT, ratios between the target and
reference channels (TMT ratios) must be converted to abundance matrix
following this procedure:
- Calculate the median of all reference channel MS2 summed intensities from the unnormalized intensity matrix for each glycopeptide
- Multiply the TMT ratios by the median MS2 summed intensities for each glycopeptide
This enables the quantification of different glycopeptides in the same sample to be comparable.
🎯 Dive Right In: Your First Analysis
library(glydet)
library(glyrepr)
library(glyexp)
library(glyclean)
#>
#> Attaching package: 'glyclean'
#> The following object is masked from 'package:stats':
#>
#> aggregate
library(glystats)Ready to see glydet in action? Let’s jump straight into
a real-world example that demonstrates its power! We’ll work with
glyexp::real_experiment — an authentic N-glycoproteomics
dataset from 12 patients with varying liver conditions.
⚠️ Pro tip: Always preprocess your data with
glyclean before diving into trait analysis. This ensures
your results are as clean and reliable as your data!
exp <- auto_clean(real_experiment) # Preprocess the data
#> ℹ Normalizing data (Median)
#> ✔ Normalizing data (Median) [130ms]
#>
#> ℹ Removing variables with >50% missing values
#> ✔ Removing variables with >50% missing values [71ms]
#>
#> ℹ Imputing missing values
#> ℹ Sample size <= 30, using sample minimum imputation
#> ℹ Imputing missing values✔ Imputing missing values [23ms]
#>
#> ℹ Aggregating data
#> ✔ Aggregating data [997ms]
#>
#> ℹ Normalizing data again
#> ✔ Normalizing data again [19ms]
exp
#>
#> ── Glycoproteomics Experiment ──────────────────────────────────────────────────
#> ℹ Expression matrix: 12 samples, 3880 variables
#> ℹ Sample information fields: group <fct>
#> ℹ Variable information fields: protein <chr>, glycan_composition <comp>, glycan_structure <struct>, protein_site <int>, gene <chr>Let’s take a quick peek at our dataset to understand what we’re working with:
get_var_info(exp)
#> # A tibble: 3,880 × 6
#> variable protein glycan_composition glycan_structure protein_site gene
#> <chr> <chr> <comp> <struct> <int> <chr>
#> 1 V1 P08185 Hex(5)HexNAc(4)NeuAc(2) NeuAc(??-?)Hex(?… 176 SERP…
#> 2 V2 P04196 Hex(5)HexNAc(4)NeuAc(1) NeuAc(??-?)Hex(?… 344 HRG
#> 3 V3 P04196 Hex(5)HexNAc(4) Hex(??-?)HexNAc(… 344 HRG
#> 4 V4 P04196 Hex(5)HexNAc(4)NeuAc(1) NeuAc(??-?)Hex(?… 344 HRG
#> 5 V5 P10909 Hex(6)HexNAc(5) Hex(??-?)HexNAc(… 291 CLU
#> 6 V6 P04196 Hex(5)HexNAc(4)NeuAc(2) NeuAc(??-?)Hex(?… 344 HRG
#> 7 V7 P04196 Hex(5)HexNAc(4) Hex(??-?)HexNAc(… 345 HRG
#> 8 V8 P04196 Hex(5)HexNAc(4)dHex(2) dHex(??-?)Hex(??… 344 HRG
#> 9 V9 P04196 Hex(4)HexNAc(3) Hex(??-?)HexNAc(… 344 HRG
#> 10 V10 P04196 Hex(4)HexNAc(4)NeuAc(1) NeuAc(??-?)Hex(?… 344 HRG
#> # ℹ 3,870 more rows
get_sample_info(exp)
#> # A tibble: 12 × 2
#> sample group
#> <chr> <fct>
#> 1 C1 C
#> 2 C2 C
#> 3 C3 C
#> 4 H1 H
#> 5 H2 H
#> 6 H3 H
#> 7 M1 M
#> 8 M2 M
#> 9 M3 M
#> 10 Y1 Y
#> 11 Y2 Y
#> 12 Y3 Y
get_expr_mat(exp)[1:5, 1:5]
#> C1 C2 C3 H1 H2
#> V1 6.626760e+03 2.019159e+04 13432.7 4.072473e+04 1.771879e+04
#> V2 3.744595e+08 5.691652e+08 99531624.5 2.372164e+04 1.422307e+07
#> V3 5.260619e+08 5.644547e+08 211645556.7 9.149818e+08 8.534716e+08
#> V4 2.983928e+09 2.665752e+09 1207235166.5 3.410355e+09 3.918161e+09
#> V5 2.751569e+07 3.200443e+07 8055532.6 6.765746e+07 4.546455e+07Now for the magic moment ✨—let’s calculate some derived traits!
trait_exp <- derive_traits(exp)
trait_exp
#>
#> ── Traitproteomics Experiment ──────────────────────────────────────────────────
#> ℹ Expression matrix: 12 samples, 3836 variables
#> ℹ Sample information fields: group <fct>
#> ℹ Variable information fields: protein <chr>, protein_site <int>, trait <chr>, gene <chr>Voilà! What you see is a brand new experiment() object
with “traitomics” type. Think of it as your original dataset’s
sophisticated cousin 🎭 — instead of tracking “quantification of each
glycan on each glycosite in each sample,” it now contains “the value of
each derived trait on each glycosite in each sample.”
get_var_info(trait_exp)
#> # A tibble: 3,836 × 5
#> variable protein protein_site trait gene
#> <chr> <chr> <int> <chr> <chr>
#> 1 V1 A6NJW9 49 TM CD8B2
#> 2 V2 A6NJW9 49 TH CD8B2
#> 3 V3 A6NJW9 49 TC CD8B2
#> 4 V4 A6NJW9 49 MM CD8B2
#> 5 V5 A6NJW9 49 CA2 CD8B2
#> 6 V6 A6NJW9 49 CA3 CD8B2
#> 7 V7 A6NJW9 49 CA4 CD8B2
#> 8 V8 A6NJW9 49 TF CD8B2
#> 9 V9 A6NJW9 49 TFc CD8B2
#> 10 V10 A6NJW9 49 TFa CD8B2
#> # ℹ 3,826 more rows
# These are the trait values!
get_expr_mat(trait_exp)[1:5, 1:5]
#> C1 C2 C3 H1 H2
#> V1 0 0 0 0 0
#> V2 0 0 0 0 0
#> V3 1 1 1 1 1
#> V4 NA NA NA NA NA
#> V5 1 1 1 1 1🎉 Congratulations! You’ve just calculated a comprehensive suite of derived traits in a site-specific manner:
-
TM: Proportion of high-mannose glycans -
TH: Proportion of hybrid glycans
-
TC: Proportion of complex glycans -
MM: Average number of mannoses within high-mannose glycans -
CA2: Proportion of bi-antennary glycans within complex glycans -
CA3: Proportion of tri-antennary glycans within complex glycans -
CA4: Proportion of tetra-antennary glycans within complex glycans -
TF: Proportion of fucosylated glycans -
TFc: Proportion of core-fucosylated glycans -
TFa: Proportion of arm-fucosylated glycans -
TB: Proportion of glycans with bisecting GlcNAc -
GS: Average degree of sialylation per galactose -
AG: Average degree of galactosylation per antenna -
TS: Proportion of sialylated glycans
💡 The key insight: We treat the glycans on each
glycosite as a separate mini-glycome, then calculate derived traits for
each one across all samples. For instance, if a particular glycosite
hosts 10 different glycans, the TFc value represents the
proportion of core-fucosylated glycans within those 10 structures in
each sample.
Important Note: All built-in derived traits only work for N-glycans. For other types of glycans, you need to define your own derived traits. Check out the Defining Custom Traits vignette to learn how to define your own derived traits. In fact, for other types of glycans not so complex as N-glycans, e.g. O-glycans, we recommend using motif quantification instead.
Now comes the fun part! 📊 You can leverage all the powerful
functions in the glystats package to analyze your derived
traits. Let’s demonstrate with an ANOVA analysis to identify glycosites
with significantly different levels of core-fucosylation across
conditions:
anova_res <- gly_anova(trait_exp)
#> ℹ Number of groups: 4
#> ℹ Groups: "H", "M", "Y", and "C"
#> ℹ Pairwise comparisons will be performed, with levels coming first as reference groups.
#> Warning: 267 variables failed to fit the model
anova_res$tidy_result$main_test |>
dplyr::filter(trait == "TFc", p_adj < 0.05)
#> # A tibble: 12 × 13
#> variable term df sumsq meansq statistic p_val p_adj post_hoc
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 V1115 group 3 0.0000941 0.0000314 26.2 1.72e-4 7.86e-3 H_vs_M;…
#> 2 V1227 group 3 0.0629 0.0210 14.0 1.50e-3 3.08e-2 H_vs_Y;…
#> 3 V1353 group 3 0.00231 0.000770 19.3 5.05e-4 1.63e-2 H_vs_C;…
#> 4 V1381 group 3 0.00640 0.00213 14.9 1.23e-3 2.71e-2 H_vs_Y;…
#> 5 V1661 group 3 0.0299 0.00998 14.3 1.40e-3 2.94e-2 H_vs_M;…
#> 6 V1675 group 3 0.0174 0.00581 43.1 2.78e-5 2.92e-3 H_vs_M;…
#> 7 V2165 group 3 0.0174 0.00581 172. 1.34e-7 1.02e-4 H_vs_M;…
#> 8 V2487 group 3 0.0644 0.0215 74.1 3.53e-6 8.98e-4 H_vs_M;…
#> 9 V2837 group 3 0.00547 0.00182 27.5 1.45e-4 7.05e-3 H_vs_M;…
#> 10 V457 group 3 0.000548 0.000183 52.2 1.34e-5 2.57e-3 H_vs_C;…
#> 11 V709 group 3 0.0771 0.0257 22.4 3.00e-4 1.23e-2 H_vs_M;…
#> 12 V919 group 3 0.00365 0.00122 31.9 8.46e-5 5.01e-3 H_vs_C;…
#> # ℹ 4 more variables: protein <chr>, protein_site <int>, trait <chr>,
#> # gene <chr>🔍 Discovery time! We’ve identified several glycosites with statistically significant differences in core-fucosylation levels across our patient groups — exactly the kind of biological insights that make derived traits so powerful!
🔧 Under the Hood: Understanding Meta-Properties
Curious about how the magic happens? Let’s lift the hood and explore
glydet’s inner workings—but don’t worry, we’ll keep things
accessible!
The key concept you need to understand is “meta-properties” - think of them as the molecular fingerprints of individual glycans.
🆚 What’s the difference?
- Derived traits describe entire glycomes (or all glycans on a glycosite) and their values fluctuate across samples
- Meta-properties describe individual glycans regardless of their abundance — like counting antennae,core fucoses, or sialic acids on a single structure
🧠 The connection: Meta-properties are the building
blocks for derived traits. When you call derive_traits(),
glydet automatically calculates meta-properties for all
glycans first, then uses this information to compute the derived traits
you see.
Want to work with meta-properties directly? 🛠️ You’re in luck!
glydet provides two handy functions:
-
get_meta_properties(): Calculate meta-properties for any set of glycans -
add_meta_properties(): Enrich yourexperiment()object by adding meta-properties to variable information
🔬 get_meta_properties()
Let’s see get_meta_properties() in action! We’ll extract
a few glycan structures from our dataset:
glycans <- unique(get_var_info(exp)$glycan_structure)[1:5]
glycans
#> <glycan_structure[5]>
#> [1] NeuAc(??-?)Hex(??-?)HexNAc(??-?)Hex(??-?)[NeuAc(??-?)Hex(??-?)HexNAc(??-?)Hex(??-?)]Hex(??-?)HexNAc(??-?)HexNAc(??-
#> [2] NeuAc(??-?)Hex(??-?)HexNAc(??-?)[HexNAc(??-?)]Hex(??-?)[Hex(??-?)Hex(??-?)]Hex(??-?)HexNAc(??-?)HexNAc(??-
#> [3] Hex(??-?)HexNAc(??-?)Hex(??-?)[Hex(??-?)HexNAc(??-?)Hex(??-?)]Hex(??-?)HexNAc(??-?)HexNAc(??-
#> [4] NeuAc(??-?)Hex(??-?)HexNAc(??-?)Hex(??-?)[Hex(??-?)HexNAc(??-?)Hex(??-?)]Hex(??-?)HexNAc(??-?)HexNAc(??-
#> [5] Hex(??-?)HexNAc(??-?)Hex(??-?)HexNAc(??-?)Hex(??-?)[Hex(??-?)HexNAc(??-?)Hex(??-?)]Hex(??-?)HexNAc(??-?)HexNAc(??-
#> # Unique structures: 5📝 Note: glycans is a
glyrepr::glycan_structure() vector—these are standardized
representations of glycan structures.
Now watch the magic happen as we calculate their meta-properties:
get_meta_properties(glycans)
#> # A tibble: 5 × 10
#> Tp B nA nF nFc nFa nG nGt nS nM
#> <fct> <lgl> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 complex FALSE 2 0 0 0 2 0 2 3
#> 2 hybrid FALSE 2 0 0 0 1 0 1 4
#> 3 complex FALSE 2 0 0 0 2 2 0 3
#> 4 complex FALSE 2 0 0 0 2 1 1 3
#> 5 complex FALSE 2 0 0 0 2 1 0 4📈 add_meta_properties()
Working with glyexp::experiment() objects? Perfect! You
can supercharge your variable information by adding meta-properties
directly:
exp_with_mp <- add_meta_properties(exp)
get_var_info(exp_with_mp)
#> # A tibble: 3,880 × 16
#> variable protein glycan_composition glycan_structure protein_site gene Tp
#> <chr> <chr> <comp> <struct> <int> <chr> <fct>
#> 1 V1 P08185 Hex(5)HexNAc(4)Ne… NeuAc(??-?)Hex(… 176 SERP… comp…
#> 2 V2 P04196 Hex(5)HexNAc(4)Ne… NeuAc(??-?)Hex(… 344 HRG hybr…
#> 3 V3 P04196 Hex(5)HexNAc(4) Hex(??-?)HexNAc… 344 HRG comp…
#> 4 V4 P04196 Hex(5)HexNAc(4)Ne… NeuAc(??-?)Hex(… 344 HRG comp…
#> 5 V5 P10909 Hex(6)HexNAc(5) Hex(??-?)HexNAc… 291 CLU comp…
#> 6 V6 P04196 Hex(5)HexNAc(4)Ne… NeuAc(??-?)Hex(… 344 HRG comp…
#> 7 V7 P04196 Hex(5)HexNAc(4) Hex(??-?)HexNAc… 345 HRG comp…
#> 8 V8 P04196 Hex(5)HexNAc(4)dH… dHex(??-?)Hex(?… 344 HRG comp…
#> 9 V9 P04196 Hex(4)HexNAc(3) Hex(??-?)HexNAc… 344 HRG comp…
#> 10 V10 P04196 Hex(4)HexNAc(4)Ne… NeuAc(??-?)Hex(… 344 HRG comp…
#> # ℹ 3,870 more rows
#> # ℹ 9 more variables: B <lgl>, nA <int>, nF <int>, nFc <int>, nFa <int>,
#> # nG <int>, nGt <int>, nS <int>, nM <int>✨ Look at that transformation! Your variable information is now enriched with multiple meta-property columns. This opens up powerful filtering possibilities based on structural features.
For instance, let’s filter for all glycoforms containing high-mannose glycans:
exp_with_mp |>
filter_var(Tp == "highmannose")
#>
#> ── Glycoproteomics Experiment ──────────────────────────────────────────────────
#> ℹ Expression matrix: 12 samples, 207 variables
#> ℹ Sample information fields: group <fct>
#> ℹ Variable information fields: protein <chr>, glycan_composition <comp>, glycan_structure <struct>, protein_site <int>, gene <chr>, Tp <fct>, B <lgl>, nA <int>, nF <int>, nFc <int>, nFa <int>, nG <int>, nGt <int>, nS <int>, nM <int>🧰 Meta-Property Functions: Your Structural Toolkit
Behind the scenes, meta-properties are actually functions that take
glyrepr::glycan_structure() vectors and return
corresponding property values. glydet comes packed with a
comprehensive library of built-in meta-property functions:
names(all_mp_fns())
#> [1] "Tp" "B" "nA" "nF" "nFc" "nFa" "nG" "nGt" "nS" "nM"📚 Your complete toolkit: Here’s the full roster of built-in meta-property functions:
| Name | Function | Description |
|---|---|---|
Tp |
n_glycan_type() |
Type of the glycan, either “complex”, “hybrid”, “highmannose”, or “pausimannose” |
B |
has_bisecting() |
Whether the glycan has a bisecting GlcNAc |
nA |
n_antennae() |
Number of antennae |
nF |
n_fuc() |
Number of fucoses |
nFc |
n_core_fuc() |
Number of core fucoses |
nFa |
n_arm_fuc() |
Number of arm fucoses |
nG |
n_gal() |
Number of galactoses |
nGt |
n_terminal_gal() |
Number of terminal galactoses |
nS |
n_sia() |
Number of sialic acids |
nM |
n_man() |
Number of mannoses |
Each function can be called directly for quick structural analysis:
n_glycan_type(glycans)
#> [1] complex hybrid complex complex complex
#> Levels: paucimannose hybrid highmannose complex🧩 Working with Structural Ambiguity
An important design principle of glydet is its ability
to handle glycan structures with varying levels of detail. All built-in
meta-properties and derived traits are designed to work with the
minimum information typically available for N-glycans
in most experimental scenarios.
🔧 Generic vs. Specific Monosaccharides
glydet works seamlessly with generic monosaccharide
names (e.g., “Hex”, “HexNAc”, “dHex”) and structures lacking linkage
information. This level of structural resolution reflects what is
commonly achievable in glycoproteomics workflows, where complete
structural determination is often challenging.
For example, this ambiguous structure works perfectly:
# Generic monosaccharides with unknown linkages ❓
ambiguous_glycan <- "HexNAc(??-?)Hex(??-?)[Hex(??-?)]Hex(??-?)HexNAc(??-?)[dHex(??-?)]HexNAc(??-"✨ Handling Detailed Structures
This design philosophy doesn’t limit glydet’s
applicability to well-characterized structures. The package equally
handles glycans with complete structural information:
# Fully specified structure with specific monosaccharides and linkages ✅
detailed_glycan <- "GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-3)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc(b1-"🚀 Extending Functionality
When working with highly detailed structural information, you may want to create specialized meta-property functions that leverage specific monosaccharide identities or linkage patterns. This allows you to define custom derived traits that capture structural features beyond the generic framework provided by the built-in functions.
Working with Glycomics Data
Working with glycomics data has no difference from working with
glycoproteomics data, even more straightforward as the resulting
experiment() has a simpler structure. Here we briefly
demonstrate how to work with glycomics data using
glyexp::real_experiment2.
exp <- auto_clean(real_experiment2)
#> ℹ Normalizing data (Median Quotient)
#> ✔ Normalizing data (Median Quotient) [14ms]
#>
#> ℹ Removing variables with >50% missing values
#> ✔ Removing variables with >50% missing values [14ms]
#>
#> ℹ Imputing missing values
#> ℹ Sample size > 100, using MissForest imputation
#> ℹ Imputing missing values✔ Imputing missing values [6s]
#>
#> ℹ Normalizing data (Total Area)
#> ✔ Normalizing data (Total Area) [13ms]
trait_exp <- derive_traits(exp)
trait_exp
#>
#> ── Traitomics Experiment ───────────────────────────────────────────────────────
#> ℹ Expression matrix: 144 samples, 14 variables
#> ℹ Sample information fields: group <fct>
#> ℹ Variable information fields: trait <chr>What’s Next?
Now you have a good understanding of glydet and how to
use it. You can try all_traits() to calculate more advanced
and detailed derived traits. You can also start to define your own
meta-property functions and derived traits. Check out the Custom
Traits vignette to learn how to define your own derived traits. Or
you can check out a special type of derived traits: motif
quantification.