Skip to contents

The goal of glydet is to describe glycosylation structural properties in a site specific manner. In the field of glycomics, this analytical approach is known as derived traits. glydet provides functions to calculate derived traits well-defined in literature, and implements a domain-specific language to define custom derived traits.

Installation

You can install the latest release of glydet from GitHub with:

# install.packages("remotes")
remotes::install_github("glycoverse/glydet@*release")

Or install the development version:

remotes::install_github("glycoverse/glydet")

Documentation

  • πŸš€ Get started: Here
  • πŸ”§ Custom derived traits: Here
  • πŸ“š Reference: Here

Role in glycoverse

glydet is a high-level package in the glycoverse ecosystem. It is designed to be used by glycomics or glycoproteomics researchers directly to calculate derived traits. It is built on top of many other packages in the glycoverse ecosystem, including glyexp, glyrepr, glyparse, and glymotif.

Example

First, let’s load necessary packages and get the data ready.

library(glyexp)
library(glyclean)
#> 
#> Attaching package: 'glyclean'
#> The following object is masked from 'package:stats':
#> 
#>     aggregate
library(glydet)

exp <- auto_clean(real_experiment)
#> β„Ή Normalizing data (Median)
#> βœ” Normalizing data (Median) [71ms]
#> 
#> β„Ή Removing variables with >50% missing values
#> βœ” Removing variables with >50% missing values [38ms]
#> 
#> β„Ή Imputing missing values
#> β„Ή Sample size <= 30, using sample minimum imputation
#> β„Ή Imputing missing valuesβœ” Imputing missing values [10ms]
#> 
#> β„Ή Aggregating data
#> βœ” Aggregating data [349ms]
#> 
#> β„Ή Normalizing data again
#> βœ” Normalizing data again [7ms]
exp
#> 
#> ── Experiment ──────────────────────────────────────────────────────────────────
#> β„Ή Expression matrix: 12 samples, 3880 variables
#> β„Ή Sample information fields: group <chr>
#> β„Ή Variable information fields: protein <chr>, gene <chr>, glycan_composition <glyrpr_c>, glycan_structure <glyrpr_s>, protein_site <int>

Now, let’s calculate some derived traits!

trait_exp <- derive_traits(exp)
trait_exp
#> 
#> ── Experiment ──────────────────────────────────────────────────────────────────
#> β„Ή Expression matrix: 12 samples, 3836 variables
#> β„Ή Sample information fields: group <chr>
#> β„Ή Variable information fields: protein <chr>, protein_site <int>, trait <chr>, gene <chr>

VoilΓ ! What you see is a brand new experiment() object with β€œtraitomics” type. Think of it as your original dataset’s sophisticated cousin 🎭 β€” instead of tracking β€œquantification of each glycan on each glycosite in each sample,” it now contains β€œthe value of each derived trait on each glycosite in each sample.”

get_var_info(trait_exp)
#> # A tibble: 3,836 Γ— 5
#>    variable protein protein_site trait gene 
#>    <chr>    <chr>          <int> <chr> <chr>
#>  1 V1       A6NJW9            49 TM    CD8B2
#>  2 V2       A6NJW9            49 TH    CD8B2
#>  3 V3       A6NJW9            49 TC    CD8B2
#>  4 V4       A6NJW9            49 MM    CD8B2
#>  5 V5       A6NJW9            49 CA2   CD8B2
#>  6 V6       A6NJW9            49 CA3   CD8B2
#>  7 V7       A6NJW9            49 CA4   CD8B2
#>  8 V8       A6NJW9            49 TF    CD8B2
#>  9 V9       A6NJW9            49 TFc   CD8B2
#> 10 V10      A6NJW9            49 TFa   CD8B2
#> # β„Ή 3,826 more rows
get_expr_mat(trait_exp)[1:5, 1:5]
#>    C1 C2 C3 H1 H2
#> V1  0  0  0  0  0
#> V2  0  0  0  0  0
#> V3  1  1  1  1  1
#> V4 NA NA NA NA NA
#> V5  1  1  1  1  1