
Create a Glycan Structure Vector
glycan_structure.Rd
A vector of glycan structures with efficient storage using hash-based deduplication.
Details
The underlying implementation uses hash values of IUPAC codes of the glycan structures. This prevents redundant storage and computation, which is very useful for glycan structures.
Each glycan structure is a directed graph modeling of a glycan structure, where nodes are monosaccharides and edges are linkages.
S3 Classes
A glycan structure vector is a vctrs record with additional S3 class glyrepr_structure
.
Therefore, sloop::s3_class()
of a glycan structure vector is
c("glyrepr_structure", "vctrs_rcrd")
.
Constraints for individual structures:
The graph must be directed and an outward tree (reducing end as root).
The graph must have a graph attribute
anomer
, in the form of "a1".The graph must have a graph attribute
alditol
, a logical value.The graph must have a vertex attribute
mono
for monosaccharide names.The graph must have a vertex attribute
sub
for substituents.The graph must have an edge attribute
linkage
for linkages.Monosaccharide name must be known, either generic (Hex, HexNAc, etc.) or concrete (Glc, Gal, etc.), but not a mixture of both. NA is not allowed.
Substituent must be valid. For single substituents, use the form "xY", where x is position and Y is substituent name, e.g. "2Ac", "3S", etc. For multiple substituents, separate them with commas and order by position, e.g. "3Me,4Ac", "2S,6P", etc. Empty string "" means no substituents.
Linkages must be valid, in the form of "a/bX-Y", where X and Y are integers, e.g. "b1-4", "a2-3", etc. NA is not allowed.
Examples
library(igraph)
# A simple glycan structure: GlcNAc(b1-4)GlcNAc
graph <- make_graph(~ 1-+2) # 1 and 2 are monosaccharides
V(graph)$mono <- c("GlcNAc", "GlcNAc")
V(graph)$sub <- ""
E(graph)$linkage <- "b1-4"
graph$anomer <- "a1"
graph$alditol <- FALSE
# Create glycan structure vector
glycan_structure(graph)
#> <glycan_structure[1]>
#> [1] GlcNAc(b1-4)GlcNAc(a1-
#> # Unique structures: 1
# Create vector with multiple structures
glycan_structure(graph, o_glycan_core_1())
#> <glycan_structure[2]>
#> [1] GlcNAc(b1-4)GlcNAc(a1-
#> [2] Gal(b1-3)GalNAc(a1-
#> # Unique structures: 2