Skip to contents

glycan_structure() creates an efficient glycan structure vector for storing and processing glycan molecular structures. The function employs hash-based deduplication mechanisms, making it suitable for glycoproteomics, glycomics analysis, and glycan structure comparison studies.

Usage

glycan_structure(...)

is_glycan_structure(x)

Arguments

...

igraph graph objects to be converted to glycan structures, or existing glycan structure vectors. Supports mixed input of multiple objects.

x

An object to check or convert.

Value

A glyrepr_structure class glycan structure vector object.

Data Structure Overview

A glycan structure vector is a vctrs record with an additional S3 class glyrepr_structure. Therefore, sloop::s3_class() returns the class hierarchy c("glyrepr_structure", "vctrs_rcrd").

Each glycan structure must satisfy the following constraints:

Graph Structure Requirements

  • Must be a directed graph with an outward tree structure (reducing end as root)

  • Must have a graph attribute anomer in the format "a1" or "b1"

    • Unknown parts can be represented with "?", e.g., "?1", "a?", "??"

Node Attributes

  • mono: Monosaccharide names, must be known monosaccharide types

    • Generic names: Hex, HexNAc, dHex, NeuAc, etc.

    • Concrete names: Glc, Gal, Man, GlcNAc, etc.

    • Cannot mix generic and concrete names

    • NA values are not allowed

  • sub: Substituent information

    • Single substituent format: "xY" (x = position, Y = substituent name), e.g., "2Ac", "3S"

    • Multiple substituents separated by commas and ordered by position, e.g., "3Me,4Ac", "2S,6P"

    • No substituents represented by empty string ""

Edge Attributes

  • linkage: Glycosidic linkage information in format "a/bX-Y"

    • Standard format: e.g., "b1-4", "a2-3"

    • Unknown positions allowed: "a1-?", "b?-3", "??-?"

    • Partially unknown positions: "a1-3/6", "a1-3/6/9"

    • NA values are not allowed

Node and Edge Order

The indices of vertices and linkages in a glycan correspond directly to their order in the IUPAC-condensed string, which is printed when you print a glycan_structure(). For example, for the glycan Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-, the vertices are "Man", "Man", "Man", "GlcNAc", "GlcNAc", and the linkages are "a1-3", "a1-6", "b1-4", "b1-4".

NA Support

Glycan structure vectors support NA values for representing missing or unknown structures:

  • Create with glycan_structure(NA) or glycan_structure(NULL)

  • Combine with valid structures: c(struct1, NA, struct2)

  • Convert from character: as_glycan_structure(c("Glc(a1-", NA))

  • smap functions skip NA elements gracefully

  • is.na() returns TRUE for NA elements

Naming Support

Glycan structure vectors can have names, which are preserved during operations. This is particularly useful when working with the glymotif package.

As characters

One side-effect of the current implementation is that you can treat a glycan structure vector as a pure character vector of IUPAC-condensed strings. In fact, is.character() returns TRUE for a glycan structure vector, and all stringr functions work directly on the vector.

However, we still recommend using as.character() to explicitly convert to character when needed, to avoid confusion and ensure that the intended behavior is clear.

Examples

library(igraph)

# Example 1: Create a simple glycan structure GlcNAc(b1-4)GlcNAc
graph <- make_graph(~ 1-+2)  # Create graph with two monosaccharides
V(graph)$mono <- c("GlcNAc", "GlcNAc")  # Set monosaccharide types
V(graph)$sub <- ""  # No substituents
E(graph)$linkage <- "b1-4"  # b1-4 glycosidic linkage
graph$anomer <- "a1"  # a anomeric carbon

# Create glycan structure vector
simple_struct <- glycan_structure(graph)
print(simple_struct)
#> <glycan_structure[1]>
#> [1] GlcNAc(b1-4)GlcNAc(a1-
#> # Unique structures: 1

# Example 2: Use predefined glycan core structures
n_core <- n_glycan_core()  # N-glycan core structure
o_core1 <- o_glycan_core_1()  # O-glycan Core 1 structure

# Example 3: Create complex structure with substituents
complex_graph <- make_graph(~ 1-+2-+3)
V(complex_graph)$mono <- c("GlcNAc", "Gal", "Neu5Ac")
V(complex_graph)$sub <- c("", "", "")  # Add substituents as needed
E(complex_graph)$linkage <- c("b1-4", "a2-3")
complex_graph$anomer <- "b1"

complex_struct <- glycan_structure(complex_graph)
print(complex_struct)
#> <glycan_structure[1]>
#> [1] Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-
#> # Unique structures: 1

# Example 4: Check if object is a glycan structure
is_glycan_structure(simple_struct)  # TRUE
#> [1] TRUE
is_glycan_structure(graph)          # FALSE
#> [1] FALSE