Getting Started with glyrepr

Welcome to the world of glycan analysis! If you’ve ever tried to work with glycans computationally, you know the struggle: these tree-like molecules are notoriously difficult to represent and analyze compared to their linear cousins like proteins or DNA. That’s where glyrepr comes to the rescue.

Think of glyrepr as your glycan translator — it teaches your computer how to “speak glycan” fluently. Whether you’re dealing with compositions (what’s in the glycan) or structures (how it’s connected), this package has got you covered.

library(glyrepr)

Quick Start: What Are We Talking About?

Before we dive in, let’s establish our vocabulary. Don’t worry — it’s simpler than it sounds!

Term	What It Means	Example
Composition	The “ingredients list” — how many of each sugar	`Hex(5)HexNAc(2)`
Structure	The “blueprint” — how sugars are connected	`Man(a1-3)Man(b1-4)GlcNAc`
Monosaccharide	A single sugar unit (the building blocks)	`Gal`, `Man`, `Hex`
Linkage	The “glue” between sugars	`a1-3`, `b1-4`
Substitution	Chemical decorations on sugars	`6Ac`, `3Me`

🔍 Pro tip: We distinguish between generic sugars (like mystery boxes labeled “Hex”) and concrete sugars (like specific boxes labeled “Galactose”).

Part 1: Compositions — The Easy Start

Let’s start with something straightforward: glycan compositions. Think of these as ingredient lists for your favorite recipes.

Creating Your First Compositions

There are three ways to create compositions, each with its own superpower:

Method 1: The Direct Approach

# Just tell R what you have
glycan_composition(c(Hex = 5, HexNAc = 2), c(Gal = 1, GalNAc = 1))
#> <glycan_composition[2]>
#> [1] Hex(5)HexNAc(2)
#> [2] Gal(1)GalNAc(1)

Method 2: The Programmatic Way

# Perfect when you're processing data from files or databases
comp_list <- list(c(Hex = 5, HexNAc = 2), c(Gal = 1, GalNAc = 1))
as_glycan_composition(comp_list)
#> <glycan_composition[2]>
#> [1] Hex(5)HexNAc(2)
#> [2] Gal(1)GalNAc(1)

Method 3: The Parser

# Copy-paste from your mass spec software? No problem!
as_glycan_composition(c("Hex(5)HexNAc(2)", "Gal(1)GalNAc(1)"))
#> <glycan_composition[2]>
#> [1] Hex(5)HexNAc(2)
#> [2] Gal(1)GalNAc(1)

The Magic of Colors 🌈

Here’s something cool: when you run these examples in your R console, you’ll see the concrete monosaccharides (like Gal and GalNAc) displayed in beautiful colors! These follow the SNFG standard — the universal “color code” for glycans. Think of it as the glycan rainbow 🌈.

Smart Counting with `count_mono()`

Now here’s where glyrepr shows its intelligence:

comp <- glycan_composition(
  c(Hex = 5, HexNAc = 2),          # generic sugars
  c(Gal = 1, Man = 1, GalNAc = 1)  # concrete sugars
)

# How many galactose residues?
count_mono(comp, "Gal")
#> [1] NA  1

# How many hexose residues? (This includes Gal and Man!)
count_mono(comp, "Hex")
#> [1] 5 2

Notice how count_mono() is smart enough to know that galactose and mannose are both hexoses? That’s the power of understanding glycan hierarchies!

Part 2: Structures — Where the Magic Happens

Compositions are nice, but structures are where glyrepr truly shines. This is like going from knowing the ingredients to understanding the actual recipe and cooking method.

Your First Glycan Structures

Let’s work with some real glycan structures. These strings below are called the “IUPAC-condensed” glycan text representations. They might look cryptic, but they’re actually quite readable once you get the hang of it. To learn about them, check out this article.

iupacs <- c(
  "Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-",  # The famous N-glycan core
  "Gal(b1-3)GalNAc(a1-",                                    # O-glycan core 1
  "Gal(b1-3)[GlcNAc(b1-6)]GalNAc(a1-",                     # O-glycan core 2
  "Man(a1-3)[Man(a1-6)]Man(a1-3)[Man(a1-6)]Man",          # A branched mannose tree
  "GlcNAc6Ac(b1-4)Glc3Me(a1-"                              # With some decorations
)

struc <- as_glycan_structure(iupacs)
struc
#> <glycan_structure[5]>
#> [1] Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-
#> [2] Gal(b1-3)GalNAc(a1-
#> [3] Gal(b1-3)[GlcNAc(b1-6)]GalNAc(a1-
#> [4] Man(a1-3)[Man(a1-6)]Man(a1-3)[Man(a1-6)]Man(?1-
#> [5] GlcNAc6Ac(b1-4)Glc3Me(a1-
#> # Unique structures: 5

The Secret Sauce: Unique Structure Optimization

Here’s where glyrepr gets really clever. Notice that “# Unique structures: 5” message? This isn’t just informational — it’s the key to lightning-fast performance.

Let’s see this optimization in action:

# Create a big dataset with lots of repetition
large_struc <- rep(struc, 1000)  # 5,000 structures total
large_struc
#> <glycan_structure[5000]>
#> [1] Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-
#> [2] Gal(b1-3)GalNAc(a1-
#> [3] Gal(b1-3)[GlcNAc(b1-6)]GalNAc(a1-
#> [4] Man(a1-3)[Man(a1-6)]Man(a1-3)[Man(a1-6)]Man(?1-
#> [5] GlcNAc6Ac(b1-4)Glc3Me(a1-
#> [6] Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-
#> [7] Gal(b1-3)GalNAc(a1-
#> [8] Gal(b1-3)[GlcNAc(b1-6)]GalNAc(a1-
#> [9] Man(a1-3)[Man(a1-6)]Man(a1-3)[Man(a1-6)]Man(?1-
#> [10] GlcNAc6Ac(b1-4)Glc3Me(a1-
#> ... (4990 more not shown)
#> # Unique structures: 5

Still showing “# Unique structures: 5”! This means glyrepr is storing only 5 unique graphs internally, not 5,000. This is like having a smart library system that stores only one copy of each book, no matter how many people want to read it.

Performance That Will Blow Your Mind 🚀

Let’s put this to the test:

library(tictoc)

tic("Converting 5 structures")
result_small <- convert_mono_type(struc, "generic")
toc()
#> Converting 5 structures: 0.025 sec elapsed

tic("Converting 5,000 structures")
result_large <- convert_mono_type(large_struc, "generic")
toc()
#> Converting 5,000 structures: 0.025 sec elapsed

Mind = blown! 🤯 The performance is nearly identical because glyrepr only processes each unique structure once, then cleverly expands the results.

Structure Manipulation Tools

glyrepr comes with several handy tools for structure manipulation:

Strip away the connections:

remove_linkages(struc)
#> <glycan_structure[5]>
#> [1] Man(??-?)[Man(??-?)]Man(??-?)GlcNAc(??-?)GlcNAc(b1-
#> [2] Gal(??-?)GalNAc(a1-
#> [3] GlcNAc(??-?)[Gal(??-?)]GalNAc(a1-
#> [4] Man(??-?)[Man(??-?)]Man(??-?)[Man(??-?)]Man(?1-
#> [5] GlcNAc6Ac(??-?)Glc3Me(a1-
#> # Unique structures: 5

Remove the decorations:

# Let's look at our decorated structure first
struc[5]
#> <glycan_structure[1]>
#> [1] GlcNAc6Ac(b1-4)Glc3Me(a1-
#> # Unique structures: 1

# Now remove the decorations (6Ac and 3Me)
remove_substituents(struc[5])
#> <glycan_structure[1]>
#> [1] GlcNAc(b1-4)Glc(a1-
#> # Unique structures: 1

Part 3: Conversions and Integrations

From Structure to Composition

Ever wondered what’s actually in those complex structures? Easy:

comp <- as_glycan_composition(struc)
comp
#> <glycan_composition[5]>
#> [1] Man(3)GlcNAc(2)
#> [2] Gal(1)GalNAc(1)
#> [3] Gal(1)GlcNAc(1)GalNAc(1)
#> [4] Man(5)
#> [5] Glc(1)GlcNAc(1)Me(1)Ac(1)

Back to Strings

Need to export your data or use it elsewhere?

# Get the original string representations
as.character(struc)
#> [1] "Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-"
#> [2] "Gal(b1-3)GalNAc(a1-"                                
#> [3] "Gal(b1-3)[GlcNAc(b1-6)]GalNAc(a1-"                  
#> [4] "Man(a1-3)[Man(a1-6)]Man(a1-3)[Man(a1-6)]Man(?1-"    
#> [5] "GlcNAc6Ac(b1-4)Glc3Me(a1-"
as.character(comp)
#> [1] "Man(3)GlcNAc(2)"           "Gal(1)GalNAc(1)"          
#> [3] "Gal(1)GlcNAc(1)GalNAc(1)"  "Man(5)"                   
#> [5] "Glc(1)GlcNAc(1)Me(1)Ac(1)"

Playing Nice with the Tidyverse

glyrepr objects are first-class citizens in the tidyverse:

suppressPackageStartupMessages(library(tibble))
suppressPackageStartupMessages(library(dplyr))

df <- tibble(
  id = seq_along(struc),
  structures = struc,
  names = c("N-glycan core", "Core 1", "Core 2", "Branched Man", "Decorated")
)

df %>% 
  mutate(n_man = count_mono(structures, "Man")) %>%
  filter(n_man > 1)
#> # A tibble: 2 × 4
#>      id structures                                          names         n_man
#>   <int> <structure>                                         <chr>         <int>
#> 1     1 Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1- N-glycan core     3
#> 2     4 Man(a1-3)[Man(a1-6)]Man(a1-3)[Man(a1-6)]Man(?1-     Branched Man      5

What’s Next?

Congratulations! You’ve just learned the fundamentals of glycan representation in R. Here’s what you can explore next:

🔬 Advanced analysis: Check out the “Power User Guide: Efficient Glycan Manipulation” vignette for power-user features
🧬 Motif searching: Try the glymotif package for finding patterns in glycan structures
📊 Visualization: Explore glycan visualization packages in the glycoverse

The glycoverse is your oyster! 🦪

Session Information

sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] dplyr_1.1.4   tibble_3.3.0  tictoc_1.2.1  glyrepr_0.5.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] jsonlite_2.0.0    compiler_4.5.1    tidyselect_1.2.1  stringr_1.5.1    
#>  [5] jquerylib_0.1.4   systemfonts_1.2.3 textshaping_1.0.1 yaml_2.3.10      
#>  [9] fastmap_1.2.0     R6_2.6.1          generics_0.1.4    igraph_2.1.4     
#> [13] knitr_1.50        backports_1.5.0   checkmate_2.3.2   rstackdeque_1.1.1
#> [17] desc_1.4.3        bslib_0.9.0       pillar_1.10.2     rlang_1.1.6      
#> [21] utf8_1.2.6        cachem_1.1.0      stringi_1.8.7     xfun_0.52        
#> [25] fs_1.6.6          sass_0.4.10       cli_3.6.5         pkgdown_2.1.3    
#> [29] magrittr_2.0.3    digest_0.6.37     lifecycle_1.0.4   vctrs_0.6.5      
#> [33] evaluate_1.0.4    glue_1.8.0        ragg_1.4.0        rmarkdown_2.29   
#> [37] purrr_1.0.4       tools_4.5.1       pkgconfig_2.0.3   htmltools_0.5.8.1