
Get Started with glydet
glydet.Rmd
🧬 N-glycans are among nature’s most fascinating molecular structures—ubiquitous yet highly constrained by strict biosynthesis rules. This structural discipline creates a unique opportunity: we can describe entire N-glycomes using carefully designed variables, such as the proportion of core-fucosylated glycans, the fraction of sialylated structures, or the average degree of sialylation per galactose.
In the N-glycomics world, these descriptive variables are known as “derived traits” — powerful summaries that capture biological meaning far beyond simple glycan abundances (the so-called “direct traits”).
Enter glydet
🚀—your toolkit for calculating derived
traits with unprecedented precision. But here’s where it gets exciting:
glydet
brings the power of derived traits to the
glycoproteomics community for the first time, enabling
site-specific trait analysis. Plus, it features an
intuitive domain-specific language that lets you define custom traits
tailored to your research needs.
Important Note: This package is built on the glyrepr package, and heavily relies on the glyexp package. If you are not familiar with these two packages, we highly recommend checking out their introductions first.
library(glydet)
library(glyrepr)
library(glyexp)
library(glyclean)
#>
#> Attaching package: 'glyclean'
#> The following object is masked from 'package:stats':
#>
#> aggregate
library(glystats)
🎯 Dive Right In: Your First Analysis
Ready to see glydet
in action? Let’s jump straight into
a real-world example that demonstrates its power! We’ll work with
glyexp::real_experiment
— an authentic N-glycoproteomics
dataset from 12 patients with varying liver conditions.
⚠️ Pro tip: Always preprocess your data with
glyclean
before diving into trait analysis. This ensures
your results are as clean and reliable as your data!
exp <- auto_clean(real_experiment) # Preprocess the data
#> ℹ Normalizing data (Median)
#> ✔ Normalizing data (Median) [137ms]
#>
#> ℹ Removing variables with >50% missing values
#> ✔ Removing variables with >50% missing values [74ms]
#>
#> ℹ Imputing missing values
#> ℹ Sample size <= 30, using sample minimum imputation
#> ℹ Imputing missing values✔ Imputing missing values [24ms]
#>
#> ℹ Aggregating data
#> ✔ Aggregating data [868ms]
#>
#> ℹ Normalizing data again
#> ✔ Normalizing data again [17ms]
exp
#>
#> ── Glycoproteomics Experiment ──────────────────────────────────────────────────
#> ℹ Expression matrix: 12 samples, 3880 variables
#> ℹ Sample information fields: group <chr>
#> ℹ Variable information fields: protein <chr>, gene <chr>, glycan_composition <comp>, glycan_structure <struct>, protein_site <int>
Let’s take a quick peek at our dataset to understand what we’re working with:
get_var_info(exp)
#> # A tibble: 3,880 × 6
#> variable protein gene glycan_composition
#> <chr> <chr> <chr> <comp>
#> 1 V1 P08185 SERPINA6 Hex(5)HexNAc(4)NeuAc(2)
#> 2 V2 P04196 HRG Hex(5)HexNAc(4)NeuAc(1)
#> 3 V3 P04196 HRG Hex(5)HexNAc(4)
#> 4 V4 P04196 HRG Hex(5)HexNAc(4)NeuAc(1)
#> 5 V5 P10909 CLU Hex(6)HexNAc(5)
#> 6 V6 P04196 HRG Hex(5)HexNAc(4)NeuAc(2)
#> 7 V7 P04196 HRG Hex(5)HexNAc(4)
#> 8 V8 P04196 HRG Hex(5)HexNAc(4)dHex(2)
#> 9 V9 P04196 HRG Hex(4)HexNAc(3)
#> 10 V10 P04196 HRG Hex(4)HexNAc(4)NeuAc(1)
#> # ℹ 3,870 more rows
#> # ℹ 2 more variables: glycan_structure <struct>, protein_site <int>
get_sample_info(exp)
#> # A tibble: 12 × 2
#> sample group
#> <chr> <chr>
#> 1 C1 C
#> 2 C2 C
#> 3 C3 C
#> 4 H1 H
#> 5 H2 H
#> 6 H3 H
#> 7 M1 M
#> 8 M2 M
#> 9 M3 M
#> 10 Y1 Y
#> 11 Y2 Y
#> 12 Y3 Y
get_expr_mat(exp)[1:5, 1:5]
#> C1 C2 C3 H1 H2
#> V1 6.626760e+03 2.019159e+04 13432.7 4.072473e+04 1.771879e+04
#> V2 3.744595e+08 5.691652e+08 99531624.5 2.372164e+04 1.422307e+07
#> V3 5.260619e+08 5.644547e+08 211645556.7 9.149818e+08 8.534716e+08
#> V4 2.983928e+09 2.665752e+09 1207235166.5 3.410355e+09 3.918161e+09
#> V5 2.751569e+07 3.200443e+07 8055532.6 6.765746e+07 4.546455e+07
Now for the magic moment ✨—let’s calculate some derived traits!
trait_exp <- derive_traits(exp)
trait_exp
#>
#> ── Traitproteomics Experiment ──────────────────────────────────────────────────
#> ℹ Expression matrix: 12 samples, 3836 variables
#> ℹ Sample information fields: group <chr>
#> ℹ Variable information fields: protein <chr>, protein_site <int>, trait <chr>, gene <chr>
Voilà! What you see is a brand new experiment()
object
with “traitomics” type. Think of it as your original dataset’s
sophisticated cousin 🎭 — instead of tracking “quantification of each
glycan on each glycosite in each sample,” it now contains “the value of
each derived trait on each glycosite in each sample.”
get_var_info(trait_exp)
#> # A tibble: 3,836 × 5
#> variable protein protein_site trait gene
#> <chr> <chr> <int> <chr> <chr>
#> 1 V1 A6NJW9 49 TM CD8B2
#> 2 V2 A6NJW9 49 TH CD8B2
#> 3 V3 A6NJW9 49 TC CD8B2
#> 4 V4 A6NJW9 49 MM CD8B2
#> 5 V5 A6NJW9 49 CA2 CD8B2
#> 6 V6 A6NJW9 49 CA3 CD8B2
#> 7 V7 A6NJW9 49 CA4 CD8B2
#> 8 V8 A6NJW9 49 TF CD8B2
#> 9 V9 A6NJW9 49 TFc CD8B2
#> 10 V10 A6NJW9 49 TFa CD8B2
#> # ℹ 3,826 more rows
# These are the trait values!
get_expr_mat(trait_exp)[1:5, 1:5]
#> C1 C2 C3 H1 H2
#> V1 0 0 0 0 0
#> V2 0 0 0 0 0
#> V3 1 1 1 1 1
#> V4 NA NA NA NA NA
#> V5 1 1 1 1 1
🎉 Congratulations! You’ve just calculated a comprehensive suite of derived traits in a site-specific manner:
-
TM
: Proportion of high-mannose glycans -
TH
: Proportion of hybrid glycans
-
TC
: Proportion of complex glycans -
MM
: Average number of mannoses within high-mannose glycans -
CA2
: Proportion of bi-antennary glycans within complex glycans -
CA3
: Proportion of tri-antennary glycans within complex glycans -
CA4
: Proportion of tetra-antennary glycans within complex glycans -
TF
: Proportion of fucosylated glycans -
TFc
: Proportion of core-fucosylated glycans -
TFa
: Proportion of arm-fucosylated glycans -
TB
: Proportion of glycans with bisecting GlcNAc -
SG
: Average degree of sialylation per galactose -
GA
: Average degree of galactosylation per antenna -
TS
: Proportion of sialylated glycans
💡 The key insight: We treat the glycans on each
glycosite as a separate mini-glycome, then calculate derived traits for
each one across all samples. For instance, if a particular glycosite
hosts 10 different glycans, the TFc
value represents the
proportion of core-fucosylated glycans within those 10 structures in
each sample.
Now comes the fun part! 📊 You can leverage all the powerful
functions in the glystats
package to analyze your derived
traits. Let’s demonstrate with an ANOVA analysis to identify glycosites
with significantly different levels of core-fucosylation across
conditions:
anova_res <- gly_anova(trait_exp)
#> ℹ Number of groups: 4
#> ℹ Groups: "C", "H", "M", and "Y"
#> Warning: 267 variables failed to fit the model
anova_res$tidy_result$main_test |>
dplyr::filter(trait == "TFc", p_adj < 0.05)
#> # A tibble: 12 × 13
#> variable protein protein_site trait gene term df sumsq meansq
#> <chr> <chr> <int> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 V457 P00748 249 TFc F12 group 3 0.000548 0.000183
#> 2 V709 P01591 71 TFc JCHAIN group 3 0.0771 0.0257
#> 3 V919 P02679 78 TFc FGG group 3 0.00365 0.00122
#> 4 V1115 P02765 176 TFc AHSG group 3 0.0000941 0.0000314
#> 5 V1227 P02790 240 TFc HPX group 3 0.0629 0.0210
#> 6 V1353 P03952 494 TFc KLKB1 group 3 0.00231 0.000770
#> 7 V1381 P04004 86 TFc VTN group 3 0.00640 0.00213
#> 8 V1661 P04278 396 TFc SHBG group 3 0.0299 0.00998
#> 9 V1675 P05090 98 TFc APOD group 3 0.0174 0.00581
#> 10 V2165 P0C0L4 1328 TFc C4A group 3 0.0174 0.00581
#> 11 V2487 P19652 103 TFc ORM2 group 3 0.0644 0.0215
#> 12 V2837 P43652 33 TFc AFM group 3 0.00547 0.00182
#> # ℹ 4 more variables: statistic <dbl>, p_value <dbl>, p_adj <dbl>,
#> # post_hoc <chr>
🔍 Discovery time! We’ve identified several glycosites with statistically significant differences in core-fucosylation levels across our patient groups — exactly the kind of biological insights that make derived traits so powerful!
🔧 Under the Hood: Understanding Meta-Properties
Curious about how the magic happens? Let’s lift the hood and explore
glydet
’s inner workings—but don’t worry, we’ll keep things
accessible!
The key concept you need to understand is “meta-properties” - think of them as the molecular fingerprints of individual glycans.
🆚 What’s the difference?
- Derived traits describe entire glycomes (or all glycans on a glycosite) and their values fluctuate across samples
- Meta-properties describe individual glycans regardless of their abundance — like counting antennae,core fucoses, or sialic acids on a single structure
🧠 The connection: Meta-properties are the building
blocks for derived traits. When you call derive_traits()
,
glydet
automatically calculates meta-properties for all
glycans first, then uses this information to compute the derived traits
you see.
Want to work with meta-properties directly? 🛠️ You’re in luck!
glydet
provides two handy functions:
-
get_meta_properties()
: Calculate meta-properties for any set of glycans -
add_meta_properties()
: Enrich yourexperiment()
object by adding meta-properties to variable information
🔬 get_meta_properties()
Let’s see get_meta_properties()
in action! We’ll extract
a few glycan structures from our dataset:
glycans <- unique(get_var_info(exp)$glycan_structure)[1:5]
glycans
#> <glycan_structure[5]>
#> [1] NeuAc(??-?)Hex(??-?)HexNAc(??-?)Hex(??-?)[NeuAc(??-?)Hex(??-?)HexNAc(??-?)Hex(??-?)]Hex(??-?)HexNAc(??-?)HexNAc(??-
#> [2] NeuAc(??-?)Hex(??-?)HexNAc(??-?)[HexNAc(??-?)]Hex(??-?)[Hex(??-?)Hex(??-?)]Hex(??-?)HexNAc(??-?)HexNAc(??-
#> [3] Hex(??-?)HexNAc(??-?)Hex(??-?)[Hex(??-?)HexNAc(??-?)Hex(??-?)]Hex(??-?)HexNAc(??-?)HexNAc(??-
#> [4] NeuAc(??-?)Hex(??-?)HexNAc(??-?)Hex(??-?)[Hex(??-?)HexNAc(??-?)Hex(??-?)]Hex(??-?)HexNAc(??-?)HexNAc(??-
#> [5] Hex(??-?)HexNAc(??-?)Hex(??-?)HexNAc(??-?)Hex(??-?)[Hex(??-?)HexNAc(??-?)Hex(??-?)]Hex(??-?)HexNAc(??-?)HexNAc(??-
#> # Unique structures: 5
📝 Note: glycans
is a
glyrepr::glycan_structure()
vector—these are standardized
representations of glycan structures.
Now watch the magic happen as we calculate their meta-properties:
get_meta_properties(glycans)
#> # A tibble: 5 × 10
#> Tp B nA nF nFc nFa nG nGt nS nM
#> <fct> <lgl> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 complex FALSE 2 0 0 0 2 0 2 3
#> 2 hybrid FALSE 2 0 0 0 1 0 1 4
#> 3 complex FALSE 2 0 0 0 2 2 0 3
#> 4 complex FALSE 2 0 0 0 2 1 1 3
#> 5 complex FALSE 2 0 0 0 2 1 0 4
📈 add_meta_properties()
Working with glyexp::experiment()
objects? Perfect! You
can supercharge your variable information by adding meta-properties
directly:
exp_with_mp <- add_meta_properties(exp)
get_var_info(exp_with_mp)
#> # A tibble: 3,880 × 16
#> variable protein gene glycan_composition
#> <chr> <chr> <chr> <comp>
#> 1 V1 P08185 SERPINA6 Hex(5)HexNAc(4)NeuAc(2)
#> 2 V2 P04196 HRG Hex(5)HexNAc(4)NeuAc(1)
#> 3 V3 P04196 HRG Hex(5)HexNAc(4)
#> 4 V4 P04196 HRG Hex(5)HexNAc(4)NeuAc(1)
#> 5 V5 P10909 CLU Hex(6)HexNAc(5)
#> 6 V6 P04196 HRG Hex(5)HexNAc(4)NeuAc(2)
#> 7 V7 P04196 HRG Hex(5)HexNAc(4)
#> 8 V8 P04196 HRG Hex(5)HexNAc(4)dHex(2)
#> 9 V9 P04196 HRG Hex(4)HexNAc(3)
#> 10 V10 P04196 HRG Hex(4)HexNAc(4)NeuAc(1)
#> # ℹ 3,870 more rows
#> # ℹ 12 more variables: glycan_structure <struct>, protein_site <int>, Tp <fct>,
#> # B <lgl>, nA <int>, nF <int>, nFc <int>, nFa <int>, nG <int>, nGt <int>,
#> # nS <int>, nM <int>
✨ Look at that transformation! Your variable information is now enriched with multiple meta-property columns. This opens up powerful filtering possibilities based on structural features.
For instance, let’s filter for all glycoforms containing high-mannose glycans:
exp_with_mp |>
filter_var(Tp == "highmannose")
#>
#> ── Glycoproteomics Experiment ──────────────────────────────────────────────────
#> ℹ Expression matrix: 12 samples, 207 variables
#> ℹ Sample information fields: group <chr>
#> ℹ Variable information fields: protein <chr>, gene <chr>, glycan_composition <comp>, glycan_structure <struct>, protein_site <int>, Tp <fct>, B <lgl>, nA <int>, nF <int>, nFc <int>, nFa <int>, nG <int>, nGt <int>, nS <int>, nM <int>
🧰 Meta-Property Functions: Your Structural Toolkit
Behind the scenes, meta-properties are actually functions that take
glyrepr::glycan_structure()
vectors and return
corresponding property values. glydet
comes packed with a
comprehensive library of built-in meta-property functions:
names(all_mp_fns())
#> [1] "Tp" "B" "nA" "nF" "nFc" "nFa" "nG" "nGt" "nS" "nM"
📚 Your complete toolkit: Here’s the full roster of built-in meta-property functions:
Name | Function | Description |
---|---|---|
Tp |
n_glycan_type() |
Type of the glycan, either “complex”, “hybrid”, “highmannose”, or “pausimannose” |
B |
has_bisecting() |
Whether the glycan has a bisecting GlcNAc |
nA |
n_antennae() |
Number of antennae |
nF |
n_fuc() |
Number of fucoses |
nFc |
n_core_fuc() |
Number of core fucoses |
nFa |
n_arm_fuc() |
Number of arm fucoses |
nG |
n_gal() |
Number of galactoses |
nGt |
n_terminal_gal() |
Number of terminal galactoses |
nS |
n_sia() |
Number of sialic acids |
nM |
n_man() |
Number of mannoses |
Each function can be called directly for quick structural analysis:
n_glycan_type(glycans)
#> [1] complex hybrid complex complex complex
#> Levels: paucimannose hybrid highmannose complex
🧩 Working with Structural Ambiguity
An important design principle of glydet
is its ability
to handle glycan structures with varying levels of detail. All built-in
meta-properties and derived traits are designed to work with the
minimum information typically available for N-glycans
in most experimental scenarios.
🔧 Generic vs. Specific Monosaccharides
glydet
works seamlessly with generic monosaccharide
names (e.g., “Hex”, “HexNAc”, “dHex”) and structures lacking linkage
information. This level of structural resolution reflects what is
commonly achievable in glycoproteomics workflows, where complete
structural determination is often challenging.
For example, this ambiguous structure works perfectly:
# Generic monosaccharides with unknown linkages ❓
ambiguous_glycan <- "HexNAc(??-?)Hex(??-?)[Hex(??-?)]Hex(??-?)HexNAc(??-?)[dHex(??-?)]HexNAc(??-"
✨ Handling Detailed Structures
This design philosophy doesn’t limit glydet
’s
applicability to well-characterized structures. The package equally
handles glycans with complete structural information:
# Fully specified structure with specific monosaccharides and linkages ✅
detailed_glycan <- "GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-3)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc(b1-"
🚀 Extending Functionality
When working with highly detailed structural information, you may want to create specialized meta-property functions that leverage specific monosaccharide identities or linkage patterns. This allows you to define custom derived traits that capture structural features beyond the generic framework provided by the built-in functions.
Working with Glycomics Data
Working with glycomics data has no difference from working with
glycoproteomics data, even more straightforward as the resulting
experiment()
has a simpler structure. Here we briefly
demonstrate how to work with glycomics data using
glyexp::real_experiment2
.
exp <- auto_clean(real_experiment2)
#> ℹ Normalizing data (Median Quotient)
#> ✔ Normalizing data (Median Quotient) [15ms]
#>
#> ℹ Removing variables with >50% missing values
#> ✔ Removing variables with >50% missing values [14ms]
#>
#> ℹ Imputing missing values
#> ℹ Sample size > 100, using MissForest imputation
#> ℹ Imputing missing values✔ Imputing missing values [15.8s]
#>
#> ℹ Normalizing data (Total Area)
#> ✔ Normalizing data (Total Area) [13ms]
trait_exp <- derive_traits(exp)
trait_exp
#>
#> ── Traitomics Experiment ───────────────────────────────────────────────────────
#> ℹ Expression matrix: 144 samples, 14 variables
#> ℹ Sample information fields: group <chr>
#> ℹ Variable information fields: trait <chr>
What’s Next?
Now you have a good understanding of glydet
and how to
use it. You can try all_traits()
to calculate more advanced
and detailed derived traits. You can also start to define your own
meta-property functions and derived traits. Check out the Custom
Traits vignette to learn how to define your own derived traits.