Converts meaningless variable IDs (like "GP1", "V1") into meaningful, human-readable IDs based on the experiment type and available columns.
The format of the new IDs depends on the exp_type if format is not specified:
glycomics:{glycan_composition}, e.g., "Hex(5)HexNAc(2)"glycoproteomics:{protein}-<site>-{glycan_composition}or{protein}-{glycan_composition}if noprotein_sitecolumn existstraitomics:{motif}or{trait}depending on which column is presenttraitproteomics:{protein}-<site>-{motif}or{protein}-<site>-{trait}
If duplicate IDs are generated (e.g., same composition with multiple PSMs),
a unique integer suffix is appended using the unique_suffix pattern.
Usage
standardize_variable(
exp,
format = NULL,
unique_suffix = "-{N}",
fasta = NULL,
taxid = 9606
)Arguments
- exp
An
experiment().- format
A format string specifying how to construct variable IDs. Use
{column_name}to insert values fromvar_infocolumns. For example,"{gene}-{glycan_composition}"would produce "GENE1-Hex(5)". IfNULL(default), a sensible format is chosen based onexp_type. Use<site>to include the amino acid and position (e.g., "N32"). The<site>token is a special placeholder that gets replaced with<aa><pos>format (e.g., "N32", "S44"). It requires theprotein_sitecolumn and uses the following decision tree to determine the amino acid:If
peptideandpeptide_sitecolumns exist, extract from peptideElse if
fastais provided, extract from FASTA sequencesElse fetch from UniProt using
UniProt.ws
- unique_suffix
A string pattern for making IDs unique when duplicates exist. Must contain
{N}which will be replaced with the numeric suffix (1, 2, 3...). Default is"-{N}"which produces IDs like "Hex(5)-1", "Hex(5)-2".- fasta
Either a file path to a FASTA file or a named character vector with protein IDs as names and sequences as values. Used to look up amino acids for site representation when peptide columns are not available. Default:
NULL(use UniProt.ws to fetch sequences).- taxid
NCBI taxonomy ID for UniProt lookup. Default:
9606(human).
