These functions check if the given glycans have the given motif(s).
have_motif()checks a single motif against multiple glycanshave_motifs()checks multiple motifs against multiple glycans
Technically speaking, they perform subgraph isomorphism tests to
determine if the motif(s) are subgraphs of the glycans.
Monosaccharides, linkages, and substituents are all considered.
Usage
have_motif(
glycans,
motif,
alignment = NULL,
ignore_linkages = FALSE,
strict_sub = TRUE,
match_degree = NULL
)
have_motifs(
glycans,
motifs,
alignments = NULL,
ignore_linkages = FALSE,
strict_sub = TRUE,
match_degree = NULL
)Arguments
- glycans
One of:
A
glyrepr::glycan_structure()vector.A glycan structure string vector. All formats supported by
glyparse::auto_parse()are accepted, including IUPAC-condensed, WURCS, GlycoCT, and others.
- motif
One of:
A
glyrepr::glycan_structure()scalar.A glycan structure string, supported by
glyparse::auto_parse().A known motif name (use
db_motifs()to see all available motifs).
- alignment
A character string. Possible values are "substructure", "core", "terminal", and "whole". If not provided, the value will be decided based on the
motifargument. Ifmotifis a motif name, the alignment in the database will be used. Otherwise, "substructure" will be used.- ignore_linkages
A logical value. If
TRUE, linkages will be ignored in the comparison. Default isFALSE.- strict_sub
A logical value. If
TRUE(default), substituents will be matched in strict mode, which means if the glycan has a substituent in some residue, the motif must have the same substituent to be matched.- match_degree
A logical vector indicating which motif nodes must match the glycan's in- and out-degree exactly. For
have_motif(),count_motif(), andmatch_motif(), this must be a logical vector with length 1 or the number of motif nodes (length 1 is recycled). Forhave_motifs(),count_motifs(), andmatch_motifs(), this must be a list of logical vectors with length equal tomotifs; each element follows the same length rules. Whenmatch_degreeis provided,alignmentandalignmentsare silently ignored.- motifs
One of:
A
glyrepr::glycan_structure()vector.A glycan structure string vector, supported by
glyparse::auto_parse().A character vector of motif names (use
db_motifs()to see all available motifs).
- alignments
A character vector specifying alignment types for each motif. Can be a single value (applied to all motifs) or a vector of the same length as motifs.
Value
have_motif(): A logical vector indicating if eachglycanhas themotif.have_motifs(): A logical matrix where rows correspond to glycans and columns correspond to motifs. Row names contain glycan identifiers and column names contain motif identifiers.
About Names
have_motif() and count_motif() perserve names from the input glycans vector.
have_motifs() and count_motifs() return a matrix with both row and column names.
The row names are the glycan names, and the column names are the motif names.
Glycan names follow the same rule as have_motif() and count_motif().
Motif names have the following rules:
If
motifshave names, use the names.If
motifsdon't have names and are known motif names in the database (e.g. "N-glycan core"), use them.Otherwise, no colnames.
Monosaccharide type
As of glyrepr 0.9.0.9000, all elements in a glycans or motifs vector
must have the same monosaccharide type ("concrete" or "generic").
This invariant is enforced when creating or combining glyrepr_structure objects.
The matching rules are:
When the motif is "generic", glycans are converted to "generic" type for comparison, allowing both concrete and generic glycans to match generic motifs.
When the motif is "concrete", glycans are used as-is, so only concrete glycans with matching monosaccharide names will match, while generic glycans will not match.
Examples:
Man(concrete glycan) vsHex(generic motif) → TRUE (Man converted to Hex for comparison)Hex(generic glycan) vsMan(concrete motif) → FALSE (names don't match)Man(concrete glycan) vsMan(concrete motif) → TRUE (exact match)Hex(generic glycan) vsHex(generic motif) → TRUE (exact match)
Linkages
Obscure linkages (e.g. "??-?") are allowed in the motif graph
(see glyrepr::possible_linkages()).
"?" in a motif graph means "anything could be OK",
so it will match any linkage in the glycan graph.
However, "?" in a glycan graph will only match "?" in the motif graph.
You can set ignore_linkages = TRUE to ignore linkages in the comparison.
Some examples:
"b1-?" in motif will match "b1-4" in glycan.
"b1-?" in motif will match "b1-?" in glycan.
"b1-4" in motif will NOT match "b1-?" in glycan.
"a1-?" in motif will NOT match "b1-4" in glycan.
"a1-?" in motif will NOT match "a?-4" in glycan.
Both motifs and glycans can have a "half-linkage" at the reducing end, e.g. "GlcNAc(b1-". The half linkage in the motif will be matched to any linkage in the glycan, or the half linkage of the glycan. e.g. Glycan "GlcNAc(b1-4)Gal(a1-" will have both "GlcNAc(b1-" and "Gal(a1-" motifs.
Alignment
According to the GlycoMotif database, a motif can be classified into four alignment types:
"substructure": The motif can be anywhere in the glycan. This is the default. See substructure for details.
"core": The motif must align with at least one connected substructure (subtree) at the reducing end of the glycan. See glycan core for details.
"terminal": The motif must align with at least one connected substructure (subtree) at the nonreducing end of the glycan. See nonreducing end for details.
"whole": The motif must align with the entire glycan. See whole-glycan for details.
When using known motifs in the GlycoMotif GlyGen Collection,
the best practice is to not provide the alignment argument,
and let the function decide the alignment based on the motif name.
However, it is still possible to override the default alignments.
In this case, the user-provided alignments will be used,
but a warning will be issued.
When match_degree is provided, alignment and alignments are ignored
without warning.
Degree matching
match_degree is used to require exact degree matching for specific motif nodes.
For each node marked TRUE, the matched glycan node must have the same in-degree
and out-degree as the motif node. Nodes marked FALSE do not enforce degree
equality. This is useful to prevent matches where the motif node is embedded in
a more highly branched glycan region (extra outgoing edges) or has extra incoming
connections compared to the motif.
Substituents
Substituents (e.g. "Ac", "SO3") are matched in strict mode. Both single and multiple substituents are supported:
Single substituents: "Neu5Ac-9Ac" will only match "Neu5Ac-9Ac" but not "Neu5Ac"
Multiple substituents: "Glc3Me6S" (has both 3Me and 6S) will only match motifs that contain both substituents, e.g., "Glc3Me6S", "Glc?Me6S", "Glc3Me?S"
"Glc3Me6S" will NOT match "Glc3Me" (missing 6S) or "Glc" (missing both)
For multiple substituents, they are internally stored as comma-separated values (e.g. "3Me,6S") and matched individually. Each substituent in the motif must have a corresponding match in the glycan, and vice versa.
Obscure linkages in motif substituents will match any linkage in glycan substituents:
Motif "Neu5Ac?Ac" will match "Neu5Ac9Ac" in the glycan
Motif "Glc?Me6S" will match "Glc3Me6S" in the glycan (? matches 3)
Motif "Glc3Me?S" will match "Glc3Me6S" in the glycan (? matches 6)
This default behavior is reasonable for most cases,
because monosaccharides with different substituents should be regarded as different.
However, you can change this behavior by setting strict_sub = FALSE.
In this case, the substituent is optional in the motif,
so the glycan "Neu5Ac9Ac" can match the motif "Neu5Ac".
Implementation
Under the hood, the function uses igraph::graph.get.subisomorphisms.vf2()
to get all possible subgraph isomorphisms between glycan and motif.
color vertex attributes are added to the graphs to distinguish monosaccharides.
For all possible matches, the function checks the following:
Alignment: using
alignment_check()Substituents: using
substituent_check()Degree: using
degree_check()(only whenmatch_degreeis provided)Linkages: using
linkage_check()Anomer: using
anomer_check()The function returnsTRUEif any of the matches pass all checks.
Examples
library(glyparse)
library(glyrepr)
(glycan <- o_glycan_core_2(mono_type = "concrete"))
#> <glycan_structure[1]>
#> [1] Gal(b1-3)[GlcNAc(b1-6)]GalNAc(a1-
#> # Unique structures: 1
# The glycan has the motif "Gal(b1-3)GalNAc(b1-"
have_motif(glycan, "Gal(b1-3)GalNAc(b1-")
#> [1] FALSE
# But not "Gal(b1-4)GalNAc(b1-" (wrong linkage)
have_motif(glycan, "Gal(b1-4)GalNAc(b1-")
#> [1] FALSE
# Set `ignore_linkages` to `TRUE` to ignore linkages
have_motif(glycan, "Gal(b1-4)GalNAc(b1-", ignore_linkages = TRUE)
#> [1] TRUE
# Different monosaccharide types are allowed
have_motif(glycan, "Hex(b1-3)HexNAc(?1-")
#> [1] TRUE
# Obscure linkages in the `motif` graph are allowed
have_motif(glycan, "Gal(b1-?)GalNAc(?1-")
#> [1] TRUE
# However, obscure linkages in `glycan` will only match "?" in the `motif` graph
glycan_2 <- parse_iupac_condensed("Gal(b1-?)[GlcNAc(b1-6)]GalNAc(?1-")
have_motif(glycan_2, "Gal(b1-3)GalNAc(?1-")
#> [1] FALSE
have_motif(glycan_2, "Gal(b1-?)GalNAc(?1-")
#> [1] TRUE
# The anomer of the motif will be matched to linkages in the glycan
have_motif(glycan_2, "GlcNAc(b1-")
#> [1] TRUE
# Alignment types
# The default type is "substructure", which means the motif can be anywhere in the glycan.
# Other options include "core", "terminal" and "whole".
glycan_3 <- parse_iupac_condensed("Gal(a1-3)Gal(a1-4)Gal(a1-6)Gal(a1-")
motifs <- c(
"Gal(a1-3)Gal(a1-4)Gal(a1-6)Gal(a1-",
"Gal(a1-3)Gal(a1-4)Gal(a1-",
"Gal(a1-4)Gal(a1-6)Gal(a1-",
"Gal(a1-4)Gal(a1-"
)
purrr::map_lgl(motifs, ~ have_motif(glycan_3, .x, alignment = "whole"))
#> [1] TRUE FALSE FALSE FALSE
purrr::map_lgl(motifs, ~ have_motif(glycan_3, .x, alignment = "core"))
#> [1] TRUE FALSE TRUE FALSE
purrr::map_lgl(motifs, ~ have_motif(glycan_3, .x, alignment = "terminal"))
#> [1] TRUE TRUE FALSE FALSE
purrr::map_lgl(motifs, ~ have_motif(glycan_3, .x, alignment = "substructure"))
#> [1] TRUE TRUE TRUE TRUE
# Substituents
glycan_4 <- "Neu5Ac9Ac(a2-3)Gal(b1-4)GlcNAc(b1-"
glycan_5 <- "Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-"
have_motif(glycan_4, glycan_5)
#> [1] FALSE
have_motif(glycan_5, glycan_4)
#> [1] FALSE
have_motif(glycan_4, glycan_4)
#> [1] TRUE
have_motif(glycan_5, glycan_5)
#> [1] TRUE
have_motif(glycan_4, glycan_5, strict_sub = FALSE)
#> [1] TRUE
have_motif(glycan_5, glycan_4, strict_sub = FALSE)
#> [1] FALSE
have_motif(glycan_4, glycan_4, strict_sub = FALSE)
#> [1] TRUE
have_motif(glycan_5, glycan_5, strict_sub = FALSE)
#> [1] TRUE
# Multiple substituents
glycan_6 <- "Glc3Me6S(a1-" # has both 3Me and 6S substituents
have_motif(glycan_6, "Glc3Me6S(a1-") # TRUE: exact match
#> [1] TRUE
have_motif(glycan_6, "Glc?Me6S(a1-") # TRUE: obscure linkage ?Me matches 3Me
#> [1] TRUE
have_motif(glycan_6, "Glc3Me?S(a1-") # TRUE: obscure linkage ?S matches 6S
#> [1] TRUE
have_motif(glycan_6, "Glc3Me(a1-") # FALSE: missing 6S substituent
#> [1] FALSE
have_motif(glycan_6, "Glc(a1-") # FALSE: missing all substituents
#> [1] FALSE
# Vectorization with single motif
glycans <- c(glycan, glycan_2, glycan_3)
motif <- "Gal(b1-3)GalNAc(b1-"
have_motif(glycans, motif)
#> [1] FALSE FALSE FALSE
# Multiple motifs with have_motifs()
glycan1 <- o_glycan_core_2(mono_type = "concrete")
glycan2 <- parse_iupac_condensed("Gal(b1-?)[GlcNAc(b1-6)]GalNAc(b1-")
glycans <- c(glycan1, glycan2)
motifs <- c("Gal(b1-3)GalNAc(b1-", "Gal(b1-4)GalNAc(b1-", "GlcNAc(b1-6)GalNAc(b1-")
have_motifs(glycans, motifs)
#> [,1] [,2] [,3]
#> Gal(b1-3)[GlcNAc(b1-6)]GalNAc(a1- FALSE FALSE FALSE
#> Gal(b1-?)[GlcNAc(b1-6)]GalNAc(b1- FALSE FALSE TRUE
# You can assign each motif a name
motifs <- c(
motif1 = "Gal(b1-3)GalNAc(b1-",
motif2 = "Gal(b1-4)GalNAc(b1-",
motif3 = "GlcNAc(b1-6)GalNAc(b1-"
)
have_motifs(glycans, motifs)
#> motif1 motif2 motif3
#> Gal(b1-3)[GlcNAc(b1-6)]GalNAc(a1- FALSE FALSE FALSE
#> Gal(b1-?)[GlcNAc(b1-6)]GalNAc(b1- FALSE FALSE TRUE
