
Glycan Graphs: The Network Behind Your Sugar Structures 🕸️
glycan-graph.Rmd
⚠️ Advanced Users Alert: This vignette is tailored
for those already familiar with graph theory and the igraph
package. If you’re new to these concepts, we recommend checking out the
igraph documentation first!
The Hidden Graph Universe of Glycans 🌌
Think of glycans as nature’s own social networks – they’re naturally represented as directed graphs, specifically as outwardly-directed trees where each sugar “talks” to its neighbors in a very structured way.
Behind the scenes, every glycan_structure()
object is
actually powered by an igraph
object. The beauty of
glycoverse
is that most users can work with the intuitive
concept of “glycan structures” without getting lost in the graph theory
weeds 🌾. But for you power users who want to peek under the hood – this
guide is your treasure map! 🗺️
What’s Actually Stored in Memory? 🧠
Representing a glycan in computer memory is like trying to pack for a month-long trip in a carry-on bag – you need to decide what’s absolutely essential!
A glycan has tons of information: linear oriented C-atoms, basetype (the stereochemical skeleton), substituents, configuration, anomeric center, ring size, linkage positions… the list goes on! 📝
Some packages (like Python’s glypy
) take the “pack
everything” approach 🎒, storing every tiny detail. This comprehensive
strategy is fantastic for specialized tasks like MS/MS spectra
simulation, but it can be overkill for everyday omics research.
glyrepr
takes a more minimalist approach ✨. Our
philosophy: if you can derive it from an IUPAC-condensed text
representation, we’ll store it. Everything else? We let it go.
This means we skip details like configuration and ring size – and that’s
usually just fine, since common carbohydrates have predictable
properties anyway.
💡 Pro Tip: Want to master IUPAC-condensed notation? Check out this comprehensive guide.
Extracting the Graph: Show Me the Network! 🔍
You can’t just throw igraph
functions at a
glycan_structure()
object – they speak different languages!
Instead, let’s extract the underlying graph using
get_structure_graphs()
:
glycan <- n_glycan_core()
graph <- get_structure_graphs(glycan)
graph
#> IGRAPH 25b208a DN-- 5 4 --
#> + attr: anomer (g/c), alditol (g/l), name (v/c), mono (v/c), sub (v/c),
#> | linkage (e/c)
#> + edges from 25b208a (vertex names):
#> [1] 1->2 2->3 3->4 3->5
Let’s decode what we’re seeing here 🕵️:
First line: Directed Named (“DN”) graph with 5 vertices (sugar units) and 4 edges (bonds). Think of it as a family tree with 5 people and 4 relationships.
Graph-level attributes:
-
anomer
🔄: The anomeric configuration of the reducing end (the “root” of our tree) -
alditol
🧪: Whether the reducing end is an alditol (spoiler: usually it’s not!)
Vertex attributes (the sugar units themselves):
-
name
🏷️: Unique ID for each sugar (like social security numbers) -
mono
🍬: The actual sugar type (“Hex”, “HexNAc”, etc.) -
sub
⚗️: Any chemical decorations attached to the sugar
Edge attributes (the connections):
-
linkage
🔗: How the sugars are connected (including bond positions and configurations)
Connection pattern: “1->2” means vertex 1 connects to vertex 2. We treat bonds as arrows pointing from the core toward the branches (even though real glycosidic bonds aren’t actually directional – it just makes coding easier! 😅)
Want to see it visually? igraph
has got you covered:
plot(graph)
Deep Dive: Dissecting the Components 🔬
Vertices: Meet Your Sugar Cast 🎭
Each vertex represents a monosaccharide with three key properties:
🏷️ Names (Unique IDs): These are auto-generated identifiers – usually simple integers, but they could be anything as long as they’re unique:
igraph::V(graph)$name
#> [1] "1" "2" "3" "4" "5"
🍬 Monosaccharides (The Star Players): These are IUPAC-condensed names like “Hex”, “HexNAc”, “Glc”, “GlcNAc”. Think of them as the “job titles” of your sugars:
igraph::V(graph)$mono
#> [1] "GlcNAc" "GlcNAc" "Man" "Man" "Man"
📚 Reference: For the complete cast of available monosaccharides, check SNFG notation or run
available_monosaccharides()
.
⚗️ Substituents (The Accessories): Chemical decorations like “Me” (methyl), “Ac” (acetyl), “S” (sulfate), etc. Position matters! “3Me” = methyl at position 3, “?S” = sulfate at unknown position:
igraph::V(graph)$sub
#> [1] "" "" "" "" ""
Got multiple decorations? No problem! They’re comma-separated and sorted by position:
glycan2 <- as_glycan_structure("Glc3Me6S(a1-")
graph2 <- get_structure_graphs(glycan2)
igraph::V(graph2)$sub
#> [1] "3Me,6S"
Edges: The Relationship Status 💕
Edges represent glycosidic bonds with a simple but powerful format:
<target anomeric config><target position> - <source position>
Here’s a real example where “Gal” has an “a” anomeric configuration, linking from position 3 of “GalNAc” to position 1 of “Gal”:
glycan3 <- as_glycan_structure("Gal(a1-3)GalNAc(b1-")
graph3 <- get_structure_graphs(glycan3)
igraph::E(graph3)$linkage
#> [1] "a1-3"
🤔 Why encode anomer info in edges? We debated this! It might seem more natural to store it with vertices, but thinking “Neu5Ac with a2-3 linkage” flows better mentally and matches IUPAC notation perfectly.
Graph-Level Attributes: The Global Settings ⚙️
🔄 Anomer: The anomeric configuration of the reducing end (the “root” sugar that doesn’t link to anything else)
🧪 Alditol: Whether the reducing end is an alditol (rarely used in omics, but included for database compatibility)
graph$anomer
#> [1] "?1"
graph$alditol
#> [1] FALSE
Now for the Fun Part: What Can You Do? 🎉
Unleash the Power of igraph
💪
Once you understand the graph structure, the entire
igraph
universe opens up!
Example 1: Count branched structures (sugars with multiple children):
Example 2: Explore the structure with breadth-first search:
bfs_result <- igraph::bfs(graph, root = 1, mode = "out")
bfs_result$order
#> + 5/5 vertices, named, from 25b208a:
#> [1] 1 2 3 4 5
Level Up with smap
Functions 🚀
Working with multiple glycans? You could use purrr
:
library(purrr)
glycans <- c(n_glycan_core(), o_glycan_core_1(), o_glycan_core_2())
graphs <- get_structure_graphs(glycans) # Extract graphs first
map_int(graphs, ~ igraph::vcount(.x)) # Then analyze
#> [1] 5 2 3
But glyrepr
’s smap
functions are way more
elegant:
The real magic ✨ of smap
functions is their
intelligence with duplicates. Real datasets often have many identical
structures, and smap
optimizes by processing unique
structures once, then efficiently expanding results back to the original
dimensions.
📖 Learn More: Dive deeper into
smap
wizardry in the dedicated vignette.
Motif Hunting with glymotif
🔍
One of the most exciting applications is identifying biologically
meaningful motifs (functional substructures). The glymotif
package, built on this graph foundation, specializes in exactly this
task.
🎯 Get Started: Check out the
glymotif
introduction to start your motif hunting adventure!
Wrapping Up: Your Graph Journey Continues 🎯
You’ve just unlocked the graph-powered engine behind
glyrepr
! You now understand:
- 🏗️ How glycan structures map to directed graphs
- 📊 What information is stored (and what’s deliberately omitted)
- 🔧 How to extract and manipulate the underlying graphs
- 🚀 How to leverage
igraph
,smap
, andglymotif
for powerful analyses
The graph representation might seem complex at first, but it’s this
solid foundation that enables all the sophisticated glycan analysis
capabilities in the glycoverse
. Now go forth and explore
your glycan networks! 🌟