Skip to contents

These functions are closely related to have_motif(). However, instead of returning logical values, they return the number of times the glycans have the motif(s).

  • count_motif() counts a single motif in multiple glycans

  • count_motifs() counts multiple motifs in multiple glycans

Usage

count_motif(glycans, motif, alignment = NULL, ignore_linkages = FALSE)

count_motifs(glycans, motifs, alignments = NULL, ignore_linkages = FALSE)

Arguments

glycans

A 'glyrepr_structure' object, or a glycan structure string vector. All formats supported by glyparse::auto_parse() are accepted, including IUPAC-condensed, WURCS, GlycoCT, and others.

motif

A 'glyrepr_structure' object, a glycan structure string, or a known motif name (use all_motifs() to see all available motifs). For glycan structure strings, all formats supported by glyparse::auto_parse() are accepted, including IUPAC-condensed, WURCS, GlycoCT, and others.

alignment

A character string. Possible values are "substructure", "core", "terminal" and "whole". If not provided, the value will be decided based on the motif argument. If motif is a motif name, the alignment in the database will be used. Otherwise, "substructure" will be used.

ignore_linkages

A logical value. If TRUE, linkages will be ignored in the comparison.

motifs

A character vector of motif names, glycan structure strings, or a 'glyrepr_structure' object. For glycan structure strings, all formats supported by glyparse::auto_parse() are accepted, including IUPAC-condensed, WURCS, GlycoCT, and others.

alignments

A character vector specifying alignment types for each motif. Can be a single value (applied to all motifs) or a vector of the same length as motifs.

Value

  • count_motif(): An integer vector indicating how many times each glycan has the motif.

  • count_motifs(): An integer matrix where rows correspond to glycans and columns correspond to motifs. Row names contain glycan identifiers and column names contain motif identifiers.

Details

This function actually perform v2f algorithm to get all possible matches between glycans and motif. However, the result is not necessarily the number of matches.

Think about the following example:

  • glycan: Gal(b1-?)[Gal(b1-?)]GlcNAc(b1-4)GlcNAc(b1-

  • motif: Gal(b1-?)[Gal(b1-?)]GlcNAc(b1-

To draw the glycan out:

Gal 1
   \ b1-? b1-4
    GlcNAc -- GlcNAc b1-
   / b1-?
Gal 2

To draw the motif out:

Gal 1
   \ b1-?
    GlcNAc b1-
   / b1-?
Gal 2

To differentiate the galactoses, we number them as "Gal 1" and "Gal 2" in both the glycan and the motif. The v2f subisomorphic algorithm will return two matches:

  • Gal 1 in the glycan matches Gal 1 in the motif, and Gal 2 matches Gal 2.

  • Gal 1 in the glycan matches Gal 2 in the motif, and Gal 2 matches Gal 1.

However, from a biological perspective, the two matches are the same. This function will take care of this, and return the "unique" number of matches.

For other details about the handling of monosaccharide, linkages, alignment, substituents, and implementation, see have_motif().

About Names

have_motif() and count_motif() return a vector with no names. It is easy to trace the names back to the original glycans.

have_motifs() and count_motifs() return a matrix with both row and column names. The row names are the glycan names, and the column names are the motif names. The names are decided according to the following rules:

  1. If glycans or motifs is a glyrepr::glycan_structure() object, the names are the IUPAC-condensed structure strings. (Sadly due to the constrains of the vctrs package glyrepr::glycan_structure() is built on, a glyrepr::glycan_structure() vector cannot have names.)

  2. If glycans or motifs is a character vector, either IUPAC-condensed structure strings or motif names, it will use the names of the character vector if exists, otherwise use the character vector itself as the names.

Examples

library(glyparse)

count_motif("Gal(b1-3)Gal(b1-3)GalNAc(b1-", "Gal(b1-")
#> [1] 2
count_motif(
  "Man(b1-?)[Man(b1-?)]GalNAc(b1-4)GlcNAc(b1-",
  "Man(b1-?)[Man(b1-?)]GalNAc(b1-"
)
#> [1] 1
count_motif("Gal(b1-3)Gal(b1-", "Man(b1-")
#> [1] 0

# Vectorized usage with single motif
count_motif(c("Gal(b1-3)Gal(b1-3)GalNAc(b1-", "Gal(b1-3)GalNAc(b1-"), "Gal(b1-")
#> [1] 2 1

# Multiple motifs with count_motifs()
glycan1 <- parse_iupac_condensed("Gal(b1-3)Gal(b1-3)GalNAc(b1-")
glycan2 <- parse_iupac_condensed("Man(b1-?)[Man(b1-?)]GalNAc(b1-4)GlcNAc(b1-")
glycans <- c(glycan1, glycan2)

motifs <- c("Gal(b1-3)GalNAc(b1-", "Gal(b1-", "Man(b1-")
result <- count_motifs(glycans, motifs)
print(result)
#>                                            Gal(b1-3)GalNAc(b1- Gal(b1- Man(b1-
#> Gal(b1-3)Gal(b1-3)GalNAc(b1-                                 1       2       0
#> Man(b1-?)[Man(b1-?)]GalNAc(b1-4)GlcNAc(b1-                   0       0       2

# Monosaccharide type matching examples
# Concrete glycan vs generic motif: matches (glycan converted to generic)
count_motif("Man(?1-", "Hex(?1-")  # Returns 1
#> [1] 1

# Generic glycan vs concrete motif: doesn't match
count_motif("Hex(?1-", "Man(?1-")  # Returns 0
#> [1] 0

# Matrix example showing type matching rules
count_motifs(glycans = c("Hex(?1-", "Man(?1-"), motifs = c("Hex(?1-", "Man(?1-"))
#>         Hex(?1- Man(?1-
#> Hex(?1-       1       0
#> Man(?1-       1       1