The goal of glydet is to describe glycosylation structural properties in a site specific manner. In the field of glycomics, this analytical approach is known as derived traits. glydet provides functions to calculate derived traits well-defined in literature, and implements a domain-specific language to define custom derived traits.
Installation
You can install the latest release of glydet from GitHub with:
# install.packages("remotes")
remotes::install_github("glycoverse/glydet@*release")
Or install the development version:
remotes::install_github("glycoverse/glydet")
Role in glycoverse
glydet is a high-level package in the glycoverse
ecosystem. It is designed to be used by glycomics or glycoproteomics researchers directly to calculate derived traits. It is built on top of many other packages in the glycoverse
ecosystem, including glyexp
, glyrepr
, glyparse
, and glymotif
.
Example
First, letβs load necessary packages and get the data ready.
library(glyexp)
library(glyclean)
#>
#> Attaching package: 'glyclean'
#> The following object is masked from 'package:stats':
#>
#> aggregate
library(glydet)
exp <- auto_clean(real_experiment)
#> βΉ Normalizing data (Median)
#> β Normalizing data (Median) [71ms]
#>
#> βΉ Removing variables with >50% missing values
#> β Removing variables with >50% missing values [38ms]
#>
#> βΉ Imputing missing values
#> βΉ Sample size <= 30, using sample minimum imputation
#> βΉ Imputing missing valuesβ Imputing missing values [10ms]
#>
#> βΉ Aggregating data
#> β Aggregating data [349ms]
#>
#> βΉ Normalizing data again
#> β Normalizing data again [7ms]
exp
#>
#> ββ Experiment ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> βΉ Expression matrix: 12 samples, 3880 variables
#> βΉ Sample information fields: group <chr>
#> βΉ Variable information fields: protein <chr>, gene <chr>, glycan_composition <glyrpr_c>, glycan_structure <glyrpr_s>, protein_site <int>
Now, letβs calculate some derived traits!
trait_exp <- derive_traits(exp)
trait_exp
#>
#> ββ Experiment ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> βΉ Expression matrix: 12 samples, 3836 variables
#> βΉ Sample information fields: group <chr>
#> βΉ Variable information fields: protein <chr>, protein_site <int>, trait <chr>, gene <chr>
VoilΓ ! What you see is a brand new experiment()
object with βtraitomicsβ type. Think of it as your original datasetβs sophisticated cousin π β instead of tracking βquantification of each glycan on each glycosite in each sample,β it now contains βthe value of each derived trait on each glycosite in each sample.β
get_var_info(trait_exp)
#> # A tibble: 3,836 Γ 5
#> variable protein protein_site trait gene
#> <chr> <chr> <int> <chr> <chr>
#> 1 V1 A6NJW9 49 TM CD8B2
#> 2 V2 A6NJW9 49 TH CD8B2
#> 3 V3 A6NJW9 49 TC CD8B2
#> 4 V4 A6NJW9 49 MM CD8B2
#> 5 V5 A6NJW9 49 CA2 CD8B2
#> 6 V6 A6NJW9 49 CA3 CD8B2
#> 7 V7 A6NJW9 49 CA4 CD8B2
#> 8 V8 A6NJW9 49 TF CD8B2
#> 9 V9 A6NJW9 49 TFc CD8B2
#> 10 V10 A6NJW9 49 TFa CD8B2
#> # βΉ 3,826 more rows
get_expr_mat(trait_exp)[1:5, 1:5]
#> C1 C2 C3 H1 H2
#> V1 0 0 0 0 0
#> V2 0 0 0 0 0
#> V3 1 1 1 1 1
#> V4 NA NA NA NA NA
#> V5 1 1 1 1 1