Skip to contents

Parse IUPAC-short-style structure characters into glycan graphs. For more information about IUPAC-short format, see https://doi.org/10.1351/pac199668101919.

Usage

parse_iupac_short(x)

Arguments

x

A character vector of IUPAC-short strings.

Value

A glycan graph if x is a single character, or a list of glycan graphs if x is a character vector.

Details

The IUPAC-short notation is a compact form of IUPAC-condensed notation. It is rarely used in database, but appears a lot in literature for its conciseness. Compared with IUPAC-condensed notation, IUPAC-short notation ignore the anomer positions, assuming they are known for common monosaccharides. For example, "Neu5Aca3Gala-" assumes the anomer of Neu5Ac is C2 (a2-3 linked). Also, the parentheses around linkages are omitted, and parentheses are used to indicate branching, e.g. "Neu5Aca3Gala3(Fuca3)GlcNAcb-".

Same as IUPAC-condensed notation, the reducing-end monosaccharide can be with or without anomer information. For example, the two strings below are all valid:

  • "Neu5Aca-"

  • "Neu5Ac"

In the first case, the anomer is "a2". In the second case, the anomer is "?2".

Examples

iupac <- "Neu5Aca3Gala3(Fuca3)GlcNAcb-"
parse_iupac_short(iupac)
#> <glycan_structure[1]>
#> [1] Neu5Ac(a2-3)Gal(a1-3)[Fuc(a1-3)]GlcNAc(b1-
#> # Unique structures: 1