This function adds motif annotations to the variable information
of a glyexp::experiment() or a tibble with a structure column.
add_motifs_int() adds integer annotations (how many motifs are present).
add_motifs_lgl() adds boolean annotations (whether the motif is present).
Usage
add_motifs_int(
x,
motifs,
alignments = NULL,
ignore_linkages = FALSE,
strict_sub = TRUE,
...
)
add_motifs_lgl(
x,
motifs,
alignments = NULL,
ignore_linkages = FALSE,
strict_sub = TRUE,
...
)
# S3 method for class 'glyexp_experiment'
add_motifs_int(
x,
motifs,
alignments = NULL,
ignore_linkages = FALSE,
strict_sub = TRUE,
...
)
# S3 method for class 'glyexp_experiment'
add_motifs_lgl(
x,
motifs,
alignments = NULL,
ignore_linkages = FALSE,
strict_sub = TRUE,
...
)
# S3 method for class 'data.frame'
add_motifs_int(
x,
motifs,
alignments = NULL,
ignore_linkages = FALSE,
strict_sub = TRUE,
...
)
# S3 method for class 'data.frame'
add_motifs_lgl(
x,
motifs,
alignments = NULL,
ignore_linkages = FALSE,
strict_sub = TRUE,
...
)Arguments
- x
A
glyexp::experiment()object, or a tibble with a structure column.- motifs
One of:
A
glyrepr::glycan_structure()vector.A glycan structure string vector, supported by
glyparse::auto_parse().A character vector of motif names (use
all_motifs()to see all available motifs).
- alignments
A character vector specifying alignment types for each motif. Can be a single value (applied to all motifs) or a vector of the same length as motifs.
- ignore_linkages
A logical value. If
TRUE, linkages will be ignored in the comparison. Default isFALSE.- strict_sub
A logical value. If
TRUE(default), substituents will be matched in strict mode, which means if the glycan has a substituent in some residue, the motif must have the same substituent to be matched.- ...
Additional arguments passed to the method.
Value
An glyexp::experiment() object with motif annotations added to the variable information.
About Names
The naming rule for the new columns is similar to that of have_motifs().
Briefly, you can use named character vector to name the motifs,
and that will be used as the new column names.
The only catchup is that you cannot pass a named glyrepr::glycan_structure() to motifs.
This is a fundamental limitation of the vctrs_rcrd class,
which glyrepr::glycan_structure() is built on.
Why do we need these functions
Adding one motif annotation to a glyexp::experiment() is easy:
exp |>
mutate_var(has_hex = have_motif(glycan_structure, "Hex"))However, adding multiple motifs is not as straightforward.
You can still use mutate_var() to add multiple motifs like this:
exp |>
mutate_var(
n_hex = count_motif(glycan_structure, "Hex"),
n_dhex = count_motif(glycan_structure, "dHex"),
n_hexnac = count_motif(glycan_structure, "HexNAc"),
)This method has two problems:
it has a lot of boilerplate code (a lot of typing)
it is not very efficient, as each call to
count_motifperforms validation and conversion onglycan_structure, which is a time-consuming process.
Therefore, we think it would be better to have a function that adds multiple motif annotations in a single call, in a more intuitive way. That's why we provide these two functions.
Under the hood, they use a more straightforward approach for glyexp::experiment() objects:
get the motif annotation matrix using
count_motifs()orhave_motifs()convert the matrix to a tibble
use
dplyr::bind_cols()to add the tibble to the variable information
Examples
library(glyexp)
exp <- real_experiment2
exp |>
add_motifs_lgl(c(
lacnac = "Gal(??-?)GlcNAc(??-",
sia_lacnac = "Neu5Ac(??-?)Gal(??-?)GlcNAc(??-"
)) |>
get_var_info()
#> # A tibble: 67 × 5
#> variable glycan_composition glycan_structure lacnac sia_lacnac
#> <chr> <comp> <struct> <lgl> <lgl>
#> 1 V1 Man(3)GlcNAc(3) GlcNAc(?1-?)Man… FALSE FALSE
#> 2 V2 Man(3)GlcNAc(7) GlcNAc(?1-?)[Gl… FALSE FALSE
#> 3 V3 Man(5)GlcNAc(2) Man(?1-?)[Man(?… FALSE FALSE
#> 4 V4 Man(4)Gal(2)GlcNAc(4)Neu5Ac(2) Neu5Ac(?2-?)Gal… TRUE TRUE
#> 5 V5 Man(3)Gal(1)GlcNAc(3) Gal(?1-?)GlcNAc… TRUE FALSE
#> 6 V6 Man(3)Gal(2)GlcNAc(4)Fuc(2) Gal(?1-?)GlcNAc… TRUE FALSE
#> 7 V7 Man(3)GlcNAc(3)Fuc(1) GlcNAc(?1-?)Man… FALSE FALSE
#> 8 V8 Man(3)GlcNAc(4) GlcNAc(?1-?)Man… FALSE FALSE
#> 9 V9 Man(3)Gal(2)GlcNAc(5)Neu5Ac(1) Neu5Ac(?2-?)Gal… TRUE TRUE
#> 10 V10 Man(3)Gal(1)GlcNAc(5)Fuc(1)Neu5A… Neu5Ac(?2-?)Gal… TRUE TRUE
#> # ℹ 57 more rows
exp |>
add_motifs_int(c(
lacnac = "Gal(??-?)GlcNAc(??-",
sia_lacnac = "Neu5Ac(??-?)Gal(??-?)GlcNAc(??-"
)) |>
get_var_info()
#> # A tibble: 67 × 5
#> variable glycan_composition glycan_structure lacnac sia_lacnac
#> <chr> <comp> <struct> <int> <int>
#> 1 V1 Man(3)GlcNAc(3) GlcNAc(?1-?)Man… 0 0
#> 2 V2 Man(3)GlcNAc(7) GlcNAc(?1-?)[Gl… 0 0
#> 3 V3 Man(5)GlcNAc(2) Man(?1-?)[Man(?… 0 0
#> 4 V4 Man(4)Gal(2)GlcNAc(4)Neu5Ac(2) Neu5Ac(?2-?)Gal… 2 2
#> 5 V5 Man(3)Gal(1)GlcNAc(3) Gal(?1-?)GlcNAc… 1 0
#> 6 V6 Man(3)Gal(2)GlcNAc(4)Fuc(2) Gal(?1-?)GlcNAc… 2 0
#> 7 V7 Man(3)GlcNAc(3)Fuc(1) GlcNAc(?1-?)Man… 0 0
#> 8 V8 Man(3)GlcNAc(4) GlcNAc(?1-?)Man… 0 0
#> 9 V9 Man(3)Gal(2)GlcNAc(5)Neu5Ac(1) Neu5Ac(?2-?)Gal… 2 1
#> 10 V10 Man(3)Gal(1)GlcNAc(5)Fuc(1)Neu5A… Neu5Ac(?2-?)Gal… 1 1
#> # ℹ 57 more rows
