glycan_structure() creates an efficient glycan structure vector for storing and
processing glycan molecular structures. The function employs hash-based deduplication
mechanisms, making it suitable for glycoproteomics, glycomics analysis, and glycan
structure comparison studies.
Data Structure Overview
A glycan structure vector is a vctrs record with an additional S3 class
glyrepr_structure. Therefore, sloop::s3_class() returns the class hierarchy
c("glyrepr_structure", "vctrs_rcrd").
Each glycan structure must satisfy the following constraints:
Graph Structure Requirements
Must be a directed graph with an outward tree structure (reducing end as root)
Must have a graph attribute
anomerin the format "a1" or "b1"Unknown parts can be represented with "?", e.g., "?1", "a?", "??"
Node Attributes
mono: Monosaccharide names, must be known monosaccharide typesGeneric names: Hex, HexNAc, dHex, NeuAc, etc.
Concrete names: Glc, Gal, Man, GlcNAc, etc.
Cannot mix generic and concrete names
NA values are not allowed
sub: Substituent informationSingle substituent format: "xY" (x = position, Y = substituent name), e.g., "2Ac", "3S"
Multiple substituents separated by commas and ordered by position, e.g., "3Me,4Ac", "2S,6P"
No substituents represented by empty string ""
Node and Edge Order
The indices of vertices and linkages in a glycan correspond directly to their
order in the IUPAC-condensed string, which is printed when you print a
glycan_structure().
For example, for the glycan Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-,
the vertices are "Man", "Man", "Man", "GlcNAc", "GlcNAc",
and the linkages are "a1-3", "a1-6", "b1-4", "b1-4".
NA Support
Glycan structure vectors support NA values for representing missing or unknown structures:
Create with
glycan_structure(NA)orglycan_structure(NULL)Combine with valid structures:
c(struct1, NA, struct2)Convert from character:
as_glycan_structure(c("Glc(a1-", NA))smapfunctions skip NA elements gracefullyis.na()returnsTRUEfor NA elements
Naming Support
Glycan structure vectors can have names, which are preserved during operations.
This is particularly useful when working with the glymotif package.
As characters
One side-effect of the current implementation is that you can treat a glycan structure vector
as a pure character vector of IUPAC-condensed strings.
In fact, is.character() returns TRUE for a glycan structure vector,
and all stringr functions work directly on the vector.
However, we still recommend using as.character() to explicitly convert to character when needed,
to avoid confusion and ensure that the intended behavior is clear.
Examples
library(igraph)
# Example 1: Create a simple glycan structure GlcNAc(b1-4)GlcNAc
graph <- make_graph(~ 1-+2) # Create graph with two monosaccharides
V(graph)$mono <- c("GlcNAc", "GlcNAc") # Set monosaccharide types
V(graph)$sub <- "" # No substituents
E(graph)$linkage <- "b1-4" # b1-4 glycosidic linkage
graph$anomer <- "a1" # a anomeric carbon
# Create glycan structure vector
simple_struct <- glycan_structure(graph)
print(simple_struct)
#> <glycan_structure[1]>
#> [1] GlcNAc(b1-4)GlcNAc(a1-
#> # Unique structures: 1
# Example 2: Use predefined glycan core structures
n_core <- n_glycan_core() # N-glycan core structure
o_core1 <- o_glycan_core_1() # O-glycan Core 1 structure
# Example 3: Create complex structure with substituents
complex_graph <- make_graph(~ 1-+2-+3)
V(complex_graph)$mono <- c("GlcNAc", "Gal", "Neu5Ac")
V(complex_graph)$sub <- c("", "", "") # Add substituents as needed
E(complex_graph)$linkage <- c("b1-4", "a2-3")
complex_graph$anomer <- "b1"
complex_struct <- glycan_structure(complex_graph)
print(complex_struct)
#> <glycan_structure[1]>
#> [1] Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-
#> # Unique structures: 1
# Example 4: Check if object is a glycan structure
is_glycan_structure(simple_struct) # TRUE
#> [1] TRUE
is_glycan_structure(graph) # FALSE
#> [1] FALSE
