Add Motif Annotations to an Experiment

This function adds motif annotations to the variable information of a glyexp::experiment(). add_motifs_int() adds integer annotations (how many motifs are present). add_motifs_lgl() adds boolean annotations (whether the motif is present).

Usage

add_motifs_int(exp, motifs, alignments = NULL, ignore_linkages = FALSE)

add_motifs_lgl(exp, motifs, alignments = NULL, ignore_linkages = FALSE)

Arguments

exp: An glyexp::experiment() object.
motifs: A character vector of motif names, IUPAC-condensed structure strings, or a 'glyrepr_structure' object.
alignments: A character vector specifying alignment types for each motif. Can be a single value (applied to all motifs) or a vector of the same length as motifs.
ignore_linkages: A logical value. If TRUE, linkages will be ignored in the comparison.

Value

An glyexp::experiment() object with motif annotations added to the variable information.

About Names

The naming rule for the new columns is similar to that of have_motifs(). Briefly, you can use named character vector to name the motifs, and that will be used as the new column names. The only catchup is that you cannot pass a named glyrepr::glycan_structure() to motifs. This is a fundamental limitation of the vctrs_rcrd class, which glyrepr::glycan_structure() is built on.

Why do we need these functions

Adding one motif annotation to a glyexp::experiment() is easy:

exp |>
  mutate_var(has_hex = have_motif(glycan_structure, "Hex"))

However, adding multiple motifs is not as straightforward. You can still use mutate_var() to add multiple motifs like this:

exp |>
  mutate_var(
    n_hex = count_motif(glycan_structure, "Hex"),
    n_dhex = count_motif(glycan_structure, "dHex"),
    n_hexnac = count_motif(glycan_structure, "HexNAc"),
  )

This method has two problems:

it has a lot of boilerplate code (a lot of typing)
it is not very efficient, as each call to count_motif performs validation and conversion on glycan_structure, which is a time-consuming process.

Advanced R users might want to use count_motifs() (the plural cousin of count_motif()) with !!!:

exp |>
  mutate_var(!!!count_motifs(glycan_structure, c("Hex", "dHex", "HexNAc")))

Sadly, this doesn't work. Firstly, count_motifs returns a matrix, not a list. Secondly, even if you use as.data.frame() to convert it to a list, !!! triggers early evaluation of glycan_structure in the calling environment, before passing it to count_motifs(). This will raise an "object not found" error, and there is no easy way to fix this, at least for now.

Therefore, we think it would be better to have a function that adds multiple motif annotations in a single call, in a more intuitive way. That's why we provide these two functions.

Under the hood, they use a more straightforward approach:

get the motif annotation matrix using count_motifs() or have_motifs()
convert the matrix to a tibble
use dplyr::bind_cols() to add the tibble to the variable information

Usage

Arguments

Value

About Names

Why do we need these functions

See also