Skip to contents

🧬 N-glycans are among nature’s most fascinating molecular structures—ubiquitous yet highly constrained by strict biosynthesis rules. This structural discipline creates a unique opportunity: we can describe entire N-glycomes using carefully designed variables, such as the proportion of core-fucosylated glycans, the fraction of sialylated structures, or the average degree of sialylation per galactose.

In the N-glycomics world, these descriptive variables are known as “derived traits” — powerful summaries that capture biological meaning far beyond simple glycan abundances (the so-called “direct traits”).

Enter glydet 🚀—your toolkit for calculating derived traits with unprecedented precision. But here’s where it gets exciting: glydet brings the power of derived traits to the glycoproteomics community for the first time, enabling site-specific trait analysis. Plus, it features an intuitive domain-specific language that lets you define custom traits tailored to your research needs.

Important Note: This package is built on the glyrepr package, and heavily relies on the glyexp package. If you are not familiar with these two packages, we highly recommend checking out their introductions first.

library(glydet)
library(glyrepr)
library(glyexp)
library(glyclean)
#> 
#> Attaching package: 'glyclean'
#> The following object is masked from 'package:stats':
#> 
#>     aggregate
library(glystats)

🎯 Dive Right In: Your First Analysis

Ready to see glydet in action? Let’s jump straight into a real-world example that demonstrates its power! We’ll work with glyexp::real_experiment — an authentic N-glycoproteomics dataset from 12 patients with varying liver conditions.

⚠️ Pro tip: Always preprocess your data with glyclean before diving into trait analysis. This ensures your results are as clean and reliable as your data!

exp <- auto_clean(real_experiment)  # Preprocess the data
#>  Normalizing data (Median)
#>  Normalizing data (Median) [137ms]
#> 
#>  Removing variables with >50% missing values
#>  Removing variables with >50% missing values [74ms]
#> 
#>  Imputing missing values
#>  Sample size <= 30, using sample minimum imputation
#>  Imputing missing values Imputing missing values [24ms]
#> 
#>  Aggregating data
#>  Aggregating data [868ms]
#> 
#>  Normalizing data again
#>  Normalizing data again [17ms]
exp
#> 
#> ── Glycoproteomics Experiment ──────────────────────────────────────────────────
#>  Expression matrix: 12 samples, 3880 variables
#>  Sample information fields: group <chr>
#>  Variable information fields: protein <chr>, gene <chr>, glycan_composition <comp>, glycan_structure <struct>, protein_site <int>

Let’s take a quick peek at our dataset to understand what we’re working with:

get_var_info(exp)
#> # A tibble: 3,880 × 6
#>    variable protein gene     glycan_composition     
#>    <chr>    <chr>   <chr>    <comp>                 
#>  1 V1       P08185  SERPINA6 Hex(5)HexNAc(4)NeuAc(2)
#>  2 V2       P04196  HRG      Hex(5)HexNAc(4)NeuAc(1)
#>  3 V3       P04196  HRG      Hex(5)HexNAc(4)        
#>  4 V4       P04196  HRG      Hex(5)HexNAc(4)NeuAc(1)
#>  5 V5       P10909  CLU      Hex(6)HexNAc(5)        
#>  6 V6       P04196  HRG      Hex(5)HexNAc(4)NeuAc(2)
#>  7 V7       P04196  HRG      Hex(5)HexNAc(4)        
#>  8 V8       P04196  HRG      Hex(5)HexNAc(4)dHex(2) 
#>  9 V9       P04196  HRG      Hex(4)HexNAc(3)        
#> 10 V10      P04196  HRG      Hex(4)HexNAc(4)NeuAc(1)
#> # ℹ 3,870 more rows
#> # ℹ 2 more variables: glycan_structure <struct>, protein_site <int>
get_sample_info(exp)
#> # A tibble: 12 × 2
#>    sample group
#>    <chr>  <chr>
#>  1 C1     C    
#>  2 C2     C    
#>  3 C3     C    
#>  4 H1     H    
#>  5 H2     H    
#>  6 H3     H    
#>  7 M1     M    
#>  8 M2     M    
#>  9 M3     M    
#> 10 Y1     Y    
#> 11 Y2     Y    
#> 12 Y3     Y
get_expr_mat(exp)[1:5, 1:5]
#>              C1           C2           C3           H1           H2
#> V1 6.626760e+03 2.019159e+04      13432.7 4.072473e+04 1.771879e+04
#> V2 3.744595e+08 5.691652e+08   99531624.5 2.372164e+04 1.422307e+07
#> V3 5.260619e+08 5.644547e+08  211645556.7 9.149818e+08 8.534716e+08
#> V4 2.983928e+09 2.665752e+09 1207235166.5 3.410355e+09 3.918161e+09
#> V5 2.751569e+07 3.200443e+07    8055532.6 6.765746e+07 4.546455e+07

Now for the magic moment ✨—let’s calculate some derived traits!

trait_exp <- derive_traits(exp)
trait_exp
#> 
#> ── Traitproteomics Experiment ──────────────────────────────────────────────────
#>  Expression matrix: 12 samples, 3836 variables
#>  Sample information fields: group <chr>
#>  Variable information fields: protein <chr>, protein_site <int>, trait <chr>, gene <chr>

Voilà! What you see is a brand new experiment() object with “traitomics” type. Think of it as your original dataset’s sophisticated cousin 🎭 — instead of tracking “quantification of each glycan on each glycosite in each sample,” it now contains “the value of each derived trait on each glycosite in each sample.”

get_var_info(trait_exp)
#> # A tibble: 3,836 × 5
#>    variable protein protein_site trait gene 
#>    <chr>    <chr>          <int> <chr> <chr>
#>  1 V1       A6NJW9            49 TM    CD8B2
#>  2 V2       A6NJW9            49 TH    CD8B2
#>  3 V3       A6NJW9            49 TC    CD8B2
#>  4 V4       A6NJW9            49 MM    CD8B2
#>  5 V5       A6NJW9            49 CA2   CD8B2
#>  6 V6       A6NJW9            49 CA3   CD8B2
#>  7 V7       A6NJW9            49 CA4   CD8B2
#>  8 V8       A6NJW9            49 TF    CD8B2
#>  9 V9       A6NJW9            49 TFc   CD8B2
#> 10 V10      A6NJW9            49 TFa   CD8B2
#> # ℹ 3,826 more rows
# These are the trait values!
get_expr_mat(trait_exp)[1:5, 1:5]
#>    C1 C2 C3 H1 H2
#> V1  0  0  0  0  0
#> V2  0  0  0  0  0
#> V3  1  1  1  1  1
#> V4 NA NA NA NA NA
#> V5  1  1  1  1  1

🎉 Congratulations! You’ve just calculated a comprehensive suite of derived traits in a site-specific manner:

  • TM: Proportion of high-mannose glycans
  • TH: Proportion of hybrid glycans
  • TC: Proportion of complex glycans
  • MM: Average number of mannoses within high-mannose glycans
  • CA2: Proportion of bi-antennary glycans within complex glycans
  • CA3: Proportion of tri-antennary glycans within complex glycans
  • CA4: Proportion of tetra-antennary glycans within complex glycans
  • TF: Proportion of fucosylated glycans
  • TFc: Proportion of core-fucosylated glycans
  • TFa: Proportion of arm-fucosylated glycans
  • TB: Proportion of glycans with bisecting GlcNAc
  • SG: Average degree of sialylation per galactose
  • GA: Average degree of galactosylation per antenna
  • TS: Proportion of sialylated glycans

💡 The key insight: We treat the glycans on each glycosite as a separate mini-glycome, then calculate derived traits for each one across all samples. For instance, if a particular glycosite hosts 10 different glycans, the TFc value represents the proportion of core-fucosylated glycans within those 10 structures in each sample.

Now comes the fun part! 📊 You can leverage all the powerful functions in the glystats package to analyze your derived traits. Let’s demonstrate with an ANOVA analysis to identify glycosites with significantly different levels of core-fucosylation across conditions:

anova_res <- gly_anova(trait_exp)
#>  Number of groups: 4
#>  Groups: "C", "H", "M", and "Y"
#> Warning: 267 variables failed to fit the model
anova_res$tidy_result$main_test |>
  dplyr::filter(trait == "TFc", p_adj < 0.05)
#> # A tibble: 12 × 13
#>    variable protein protein_site trait gene   term     df     sumsq    meansq
#>    <chr>    <chr>          <int> <chr> <chr>  <chr> <dbl>     <dbl>     <dbl>
#>  1 V457     P00748           249 TFc   F12    group     3 0.000548  0.000183 
#>  2 V709     P01591            71 TFc   JCHAIN group     3 0.0771    0.0257   
#>  3 V919     P02679            78 TFc   FGG    group     3 0.00365   0.00122  
#>  4 V1115    P02765           176 TFc   AHSG   group     3 0.0000941 0.0000314
#>  5 V1227    P02790           240 TFc   HPX    group     3 0.0629    0.0210   
#>  6 V1353    P03952           494 TFc   KLKB1  group     3 0.00231   0.000770 
#>  7 V1381    P04004            86 TFc   VTN    group     3 0.00640   0.00213  
#>  8 V1661    P04278           396 TFc   SHBG   group     3 0.0299    0.00998  
#>  9 V1675    P05090            98 TFc   APOD   group     3 0.0174    0.00581  
#> 10 V2165    P0C0L4          1328 TFc   C4A    group     3 0.0174    0.00581  
#> 11 V2487    P19652           103 TFc   ORM2   group     3 0.0644    0.0215   
#> 12 V2837    P43652            33 TFc   AFM    group     3 0.00547   0.00182  
#> # ℹ 4 more variables: statistic <dbl>, p_value <dbl>, p_adj <dbl>,
#> #   post_hoc <chr>

🔍 Discovery time! We’ve identified several glycosites with statistically significant differences in core-fucosylation levels across our patient groups — exactly the kind of biological insights that make derived traits so powerful!

🔧 Under the Hood: Understanding Meta-Properties

Curious about how the magic happens? Let’s lift the hood and explore glydet’s inner workings—but don’t worry, we’ll keep things accessible!

The key concept you need to understand is “meta-properties” - think of them as the molecular fingerprints of individual glycans.

🆚 What’s the difference?

  • Derived traits describe entire glycomes (or all glycans on a glycosite) and their values fluctuate across samples
  • Meta-properties describe individual glycans regardless of their abundance — like counting antennae,core fucoses, or sialic acids on a single structure

🧠 The connection: Meta-properties are the building blocks for derived traits. When you call derive_traits(), glydet automatically calculates meta-properties for all glycans first, then uses this information to compute the derived traits you see.

Want to work with meta-properties directly? 🛠️ You’re in luck! glydet provides two handy functions:

🔬 get_meta_properties()

Let’s see get_meta_properties() in action! We’ll extract a few glycan structures from our dataset:

glycans <- unique(get_var_info(exp)$glycan_structure)[1:5]
glycans
#> <glycan_structure[5]>
#> [1] NeuAc(??-?)Hex(??-?)HexNAc(??-?)Hex(??-?)[NeuAc(??-?)Hex(??-?)HexNAc(??-?)Hex(??-?)]Hex(??-?)HexNAc(??-?)HexNAc(??-
#> [2] NeuAc(??-?)Hex(??-?)HexNAc(??-?)[HexNAc(??-?)]Hex(??-?)[Hex(??-?)Hex(??-?)]Hex(??-?)HexNAc(??-?)HexNAc(??-
#> [3] Hex(??-?)HexNAc(??-?)Hex(??-?)[Hex(??-?)HexNAc(??-?)Hex(??-?)]Hex(??-?)HexNAc(??-?)HexNAc(??-
#> [4] NeuAc(??-?)Hex(??-?)HexNAc(??-?)Hex(??-?)[Hex(??-?)HexNAc(??-?)Hex(??-?)]Hex(??-?)HexNAc(??-?)HexNAc(??-
#> [5] Hex(??-?)HexNAc(??-?)Hex(??-?)HexNAc(??-?)Hex(??-?)[Hex(??-?)HexNAc(??-?)Hex(??-?)]Hex(??-?)HexNAc(??-?)HexNAc(??-
#> # Unique structures: 5

📝 Note: glycans is a glyrepr::glycan_structure() vector—these are standardized representations of glycan structures.

Now watch the magic happen as we calculate their meta-properties:

get_meta_properties(glycans)
#> # A tibble: 5 × 10
#>   Tp      B        nA    nF   nFc   nFa    nG   nGt    nS    nM
#>   <fct>   <lgl> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 complex FALSE     2     0     0     0     2     0     2     3
#> 2 hybrid  FALSE     2     0     0     0     1     0     1     4
#> 3 complex FALSE     2     0     0     0     2     2     0     3
#> 4 complex FALSE     2     0     0     0     2     1     1     3
#> 5 complex FALSE     2     0     0     0     2     1     0     4

📈 add_meta_properties()

Working with glyexp::experiment() objects? Perfect! You can supercharge your variable information by adding meta-properties directly:

exp_with_mp <- add_meta_properties(exp)
get_var_info(exp_with_mp)
#> # A tibble: 3,880 × 16
#>    variable protein gene     glycan_composition     
#>    <chr>    <chr>   <chr>    <comp>                 
#>  1 V1       P08185  SERPINA6 Hex(5)HexNAc(4)NeuAc(2)
#>  2 V2       P04196  HRG      Hex(5)HexNAc(4)NeuAc(1)
#>  3 V3       P04196  HRG      Hex(5)HexNAc(4)        
#>  4 V4       P04196  HRG      Hex(5)HexNAc(4)NeuAc(1)
#>  5 V5       P10909  CLU      Hex(6)HexNAc(5)        
#>  6 V6       P04196  HRG      Hex(5)HexNAc(4)NeuAc(2)
#>  7 V7       P04196  HRG      Hex(5)HexNAc(4)        
#>  8 V8       P04196  HRG      Hex(5)HexNAc(4)dHex(2) 
#>  9 V9       P04196  HRG      Hex(4)HexNAc(3)        
#> 10 V10      P04196  HRG      Hex(4)HexNAc(4)NeuAc(1)
#> # ℹ 3,870 more rows
#> # ℹ 12 more variables: glycan_structure <struct>, protein_site <int>, Tp <fct>,
#> #   B <lgl>, nA <int>, nF <int>, nFc <int>, nFa <int>, nG <int>, nGt <int>,
#> #   nS <int>, nM <int>

Look at that transformation! Your variable information is now enriched with multiple meta-property columns. This opens up powerful filtering possibilities based on structural features.

For instance, let’s filter for all glycoforms containing high-mannose glycans:

exp_with_mp |>
  filter_var(Tp == "highmannose")
#> 
#> ── Glycoproteomics Experiment ──────────────────────────────────────────────────
#>  Expression matrix: 12 samples, 207 variables
#>  Sample information fields: group <chr>
#>  Variable information fields: protein <chr>, gene <chr>, glycan_composition <comp>, glycan_structure <struct>, protein_site <int>, Tp <fct>, B <lgl>, nA <int>, nF <int>, nFc <int>, nFa <int>, nG <int>, nGt <int>, nS <int>, nM <int>

🧰 Meta-Property Functions: Your Structural Toolkit

Behind the scenes, meta-properties are actually functions that take glyrepr::glycan_structure() vectors and return corresponding property values. glydet comes packed with a comprehensive library of built-in meta-property functions:

names(all_mp_fns())
#>  [1] "Tp"  "B"   "nA"  "nF"  "nFc" "nFa" "nG"  "nGt" "nS"  "nM"

📚 Your complete toolkit: Here’s the full roster of built-in meta-property functions:

Name Function Description
Tp n_glycan_type() Type of the glycan, either “complex”, “hybrid”, “highmannose”, or “pausimannose”
B has_bisecting() Whether the glycan has a bisecting GlcNAc
nA n_antennae() Number of antennae
nF n_fuc() Number of fucoses
nFc n_core_fuc() Number of core fucoses
nFa n_arm_fuc() Number of arm fucoses
nG n_gal() Number of galactoses
nGt n_terminal_gal() Number of terminal galactoses
nS n_sia() Number of sialic acids
nM n_man() Number of mannoses

Each function can be called directly for quick structural analysis:

n_glycan_type(glycans)
#> [1] complex hybrid  complex complex complex
#> Levels: paucimannose hybrid highmannose complex

🧩 Working with Structural Ambiguity

An important design principle of glydet is its ability to handle glycan structures with varying levels of detail. All built-in meta-properties and derived traits are designed to work with the minimum information typically available for N-glycans in most experimental scenarios.

🔧 Generic vs. Specific Monosaccharides

glydet works seamlessly with generic monosaccharide names (e.g., “Hex”, “HexNAc”, “dHex”) and structures lacking linkage information. This level of structural resolution reflects what is commonly achievable in glycoproteomics workflows, where complete structural determination is often challenging.

For example, this ambiguous structure works perfectly:

# Generic monosaccharides with unknown linkages ❓
ambiguous_glycan <- "HexNAc(??-?)Hex(??-?)[Hex(??-?)]Hex(??-?)HexNAc(??-?)[dHex(??-?)]HexNAc(??-"

✨ Handling Detailed Structures

This design philosophy doesn’t limit glydet’s applicability to well-characterized structures. The package equally handles glycans with complete structural information:

# Fully specified structure with specific monosaccharides and linkages ✅
detailed_glycan <- "GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-3)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc(b1-"

🚀 Extending Functionality

When working with highly detailed structural information, you may want to create specialized meta-property functions that leverage specific monosaccharide identities or linkage patterns. This allows you to define custom derived traits that capture structural features beyond the generic framework provided by the built-in functions.

Working with Glycomics Data

Working with glycomics data has no difference from working with glycoproteomics data, even more straightforward as the resulting experiment() has a simpler structure. Here we briefly demonstrate how to work with glycomics data using glyexp::real_experiment2.

exp <- auto_clean(real_experiment2)
#>  Normalizing data (Median Quotient)
#>  Normalizing data (Median Quotient) [15ms]
#> 
#>  Removing variables with >50% missing values
#>  Removing variables with >50% missing values [14ms]
#> 
#>  Imputing missing values
#>  Sample size > 100, using MissForest imputation
#>  Imputing missing values Imputing missing values [15.8s]
#> 
#>  Normalizing data (Total Area)
#>  Normalizing data (Total Area) [13ms]
trait_exp <- derive_traits(exp)
trait_exp
#> 
#> ── Traitomics Experiment ───────────────────────────────────────────────────────
#>  Expression matrix: 144 samples, 14 variables
#>  Sample information fields: group <chr>
#>  Variable information fields: trait <chr>

What’s Next?

Now you have a good understanding of glydet and how to use it. You can try all_traits() to calculate more advanced and detailed derived traits. You can also start to define your own meta-property functions and derived traits. Check out the Custom Traits vignette to learn how to define your own derived traits.