
Get Started with glyexp
glyexp.Rmd
Picture this: youβre knee-deep in omics experiments (especially the fascinating world of glycomics and glycoproteomics), and youβre juggling three types of data like a lab virtuoso:
- Expression data - the actual measurements of your biological molecules (glycans, glycopeptides, and their friends)
- Molecular annotations - the ID cards for your molecules (structures, sequences, you name it)
- Experimental metadata - the story behind your samples (time points, treatments, experimental conditions)
Hereβs where glyexp
swoops in to save the day! π¦ΈββοΈ
The experiment()
class is your new best friend - think
of it as a smart container that keeps all three data types organized and
talking to each other. No more scattered spreadsheets or lost
annotations!
Why should you care? Every package in the
glycoverse
ecosystem speaks experiment()
fluently. Itβs like having a universal translator for your glycomics
workflow - everything just clicks together.
library(glyexp)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:glyexp':
#>
#> select_var
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(conflicted)
# Resolve function conflicts - prefer glyexp version over deprecated dplyr version
# `dplyr::select_var` is deprecated anyway, so we can safely override it
conflicts_prefer(glyexp::select_var)
#> [conflicted] Will prefer glyexp::select_var over
#> any other package.
Your First Steps into the Glycoverse
Letβs dive in with our trusty toy experiment - think of it as your training wheels before you tackle the real deal.
toy_exp <- toy_experiment
toy_exp
#>
#> ββ Glycoproteomics Experiment ββββββββββββββββββββββββββββββββββββββββββββββββββ
#> βΉ Expression matrix: 6 samples, 4 variables
#> βΉ Sample information fields: group <chr>, batch <dbl>
#> βΉ Variable information fields: protein <chr>, peptide <chr>, glycan_composition <chr>
Look at that beautiful summary! When you print an
experiment()
object, itβs like getting a snapshot of your
entire experimental world - variables, observations, and all the
metadata that makes your data meaningful.
Now, letβs peek under the hood. You can extract the three core components faster than you can say βglycosylationβ:
𧬠The Expression Matrix - Your Dataβs Heart and Soul
get_expr_mat(toy_exp)
#> S1 S2 S3 S4 S5 S6
#> V1 1 5 9 13 17 21
#> V2 2 6 10 14 18 22
#> V3 3 7 11 15 19 23
#> V4 4 8 12 16 20 24
This matrix is where the magic happens - rows are your variables (molecules), columns are your observations (samples), and the numbers tell your biological story.
π·οΈ Variable Information - Meet Your Molecules
get_var_info(toy_exp)
#> # A tibble: 4 Γ 4
#> variable protein peptide glycan_composition
#> <chr> <chr> <chr> <chr>
#> 1 V1 PRO1 PEP1 H5N2
#> 2 V2 PRO2 PEP2 H5N2
#> 3 V3 PRO3 PEP3 H3N2
#> 4 V4 PRO3 PEP4 H3N2
Think of this as your molecular address book - every variable gets its own detailed profile.
π Sample Information - Know Your Experiments
get_sample_info(toy_exp)
#> # A tibble: 6 Γ 3
#> sample group batch
#> <chr> <chr> <dbl>
#> 1 S1 A 1
#> 2 S2 A 2
#> 3 S3 A 1
#> 4 S4 B 2
#> 5 S5 B 1
#> 6 S6 B 2
And this? This is your experimental diary - tracking every condition, timepoint, and treatment.
Hereβs the cool part: Notice how the βvariableβ
column in get_var_info()
and the βsampleβ column in
get_sample_info()
perfectly match the row and column names
in your expression matrix? Thatβs no accident!
These are the index columns - the secret sauce that keeps everything synchronized. Theyβre like the GPS coordinates that ensure your data stays connected no matter what transformations you throw at it.
Data Wrangling Made Easy - dplyr Meets glyexp
If youβve ever used dplyr
(and who hasnβt?), youβre
already 90% of the way there! π
For every dplyr
function you know and love,
glyexp
gives you two specialized versions:
-
_obs()
functions: work on your sample metadata -
_var()
functions: work on your variable annotations
Letβs see this in action. Want to focus on just group βAβ samples?
subset_exp <- filter_obs(toy_exp, group == "A")
Letβs check what happened to our sample info:
get_sample_info(subset_exp)
#> # A tibble: 3 Γ 3
#> sample group batch
#> <chr> <chr> <dbl>
#> 1 S1 A 1
#> 2 S2 A 2
#> 3 S3 A 1
Beautiful! But hereβs where the magic really shines - check out the expression matrix:
get_expr_mat(subset_exp)
#> S1 S2 S3
#> V1 1 5 9
#> V2 2 6 10
#> V3 3 7 11
#> V4 4 8 12
πͺ Ta-da! The expression matrix automatically filtered itself to match! Itβs like having a well-trained assistant who anticipates your every move.
This is filter_obs()
in a nutshell: βHey, filter my
sample info this way, and oh yeah, make sure everything else follows
suit.β And it does, flawlessly.
Variable filtering works the same way:
toy_exp |>
filter_obs(group == "A") |>
filter_var(glycan_composition == "H5N2") |>
get_expr_mat()
#> S1 S2 S3
#> V1 1 5 9
#> V2 2 6 10
Notice how these functions support the pipe operator
(|>
)? Thatβs the dplyr
DNA in action!
The pattern is simple: glyexp
functions are just like
their dplyr
cousins, but with two superpowers:
- They expect and return
experiment()
objects (keeping your data ecosystem intact) - They treat those index columns like precious cargo (no accidental deletions here!)
Complete dplyr Function Reference
Hereβs your complete toolkit of supported dplyr-style functions. These functions orchestrate seamless coordination between all three data types - expression matrix, sample information, and variable information - ensuring everything stays perfectly synchronized:
The magic ingredient? Every single one of these functions automatically updates the expression matrix to match your metadata operations. Filter out half your samples? The matrix follows suit. Rearrange your variables? The matrix dances to the same tune.
What about other dplyr
functions? For
functions not directly supported (like distinct()
,
pull()
, count()
, etc.), simply extract the
tibble first and go wild:
# Extract the tibble, then use any dplyr function you want
toy_exp |>
get_sample_info() |>
distinct(group)
toy_exp |>
get_var_info() |>
pull(protein) |>
unique()
toy_exp |>
get_sample_info() |>
count(group)
The Sacred Index Columns - Handle with Care
Remember those index columns we mentioned? Hereβs the golden rule:
Donβt mess with them directly! The reason is simple:
glyexp
relies on them to synchronize expression matrix,
sample info, and variable info. Look at this picture to understand it
better:
Think of them as the foundation of your data house - you can redecorate all you want, but donβt touch the support beams.
Want to select specific columns from your sample info? Easy:
toy_exp |>
select_obs(group) |>
get_sample_info()
#> # A tibble: 6 Γ 2
#> sample group
#> <chr> <chr>
#> 1 S1 A
#> 2 S2 A
#> 3 S3 A
#> 4 S4 B
#> 5 S5 B
#> 6 S6 B
See how the βsampleβ column (our trusty index) stuck around? Thatβs
glyexp
being protective of your data integrity.
Even when you try to be sneaky, itβs got your back:
toy_exp |>
select_obs(-starts_with("sample")) |>
get_sample_info()
#> # A tibble: 6 Γ 3
#> sample group batch
#> <chr> <chr> <dbl>
#> 1 S1 A 1
#> 2 S2 A 2
#> 3 S3 A 1
#> 4 S4 B 2
#> 5 S5 B 1
#> 6 S6 B 2
Nice try, but that index column isnβt going anywhere! π
Slicing and Dicing - Matrix-Style Subsetting
Want to subset your experiment? Think matrix indexing, but smarter:
subset_exp <- toy_exp[, 1:3]
This grabs the first 3 samples, and like a good butler, updates everything else accordingly:
get_expr_mat(subset_exp)
#> S1 S2 S3
#> V1 1 5 9
#> V2 2 6 10
#> V3 3 7 11
#> V4 4 8 12
get_sample_info(subset_exp)
#> # A tibble: 3 Γ 3
#> sample group batch
#> <chr> <chr> <dbl>
#> 1 S1 A 1
#> 2 S2 A 2
#> 3 S3 A 1
Both the expression matrix and sample info are perfectly in sync. Itβs like theyβre dancing to the same tune!
Merging and Splitting - The Dynamic Duo
Imagine this: you run your favorite glycopeptide identification
software two times on two batches, and you use glyread
to
load the results into two [experiment()]s. How do you combine them into
a single [experiment()]? The answer is merge()
:
merge(exp1, exp2)
What it does is quite complex, but you can rely on
merge()
to handle it for you. If you want to keep the batch
information, you can use mutate_obs()
to add an ID column
before merging.
What if you have more than one experiment to merge? Put them in a
list and use purrr::reduce()
to merge them:
The opposite of merge()
is split()
, which
splits an [experiment()] into a list of [experiment()]s. You can provide
a column to split by, and the unique values of that column will be used
as the names of the list.
split(toy_exp, group, where = "sample_info")
#> $A
#>
#> ββ Glycoproteomics Experiment ββββββββββββββββββββββββββββββββββββββββββββββββββ
#> βΉ Expression matrix: 3 samples, 4 variables
#> βΉ Sample information fields: group <chr>, batch <dbl>
#> βΉ Variable information fields: protein <chr>, peptide <chr>, glycan_composition <chr>
#>
#> $B
#>
#> ββ Glycoproteomics Experiment ββββββββββββββββββββββββββββββββββββββββββββββββββ
#> βΉ Expression matrix: 3 samples, 4 variables
#> βΉ Sample information fields: group <chr>, batch <dbl>
#> βΉ Variable information fields: protein <chr>, peptide <chr>, glycan_composition <chr>
Now the βAβ experiment only contains samples from group βAβ, and βBβ experiment from group βBβ.
When You Need to Break Free - The Tibble Escape Hatch
The glycoverse
ecosystem is pretty comprehensive, but we
know there are times when you need to venture beyond our cozy world.
When that moment comes, you can always fall back to basic R data
structures by get_expr_mat()
,
get_sample_info()
, and get_var_info()
.
Alternatively, use as_tibble()
to convert your
experiment()
to a tibble in tidy-format:
as_tibble(toy_exp)
#> # A tibble: 24 Γ 8
#> sample group batch variable protein peptide glycan_composition value
#> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <int>
#> 1 S1 A 1 V1 PRO1 PEP1 H5N2 1
#> 2 S2 A 2 V1 PRO1 PEP1 H5N2 5
#> 3 S3 A 1 V1 PRO1 PEP1 H5N2 9
#> 4 S4 B 2 V1 PRO1 PEP1 H5N2 13
#> 5 S5 B 1 V1 PRO1 PEP1 H5N2 17
#> 6 S6 B 2 V1 PRO1 PEP1 H5N2 21
#> 7 S1 A 1 V2 PRO2 PEP2 H5N2 2
#> 8 S2 A 2 V2 PRO2 PEP2 H5N2 6
#> 9 S3 A 1 V2 PRO2 PEP2 H5N2 10
#> 10 S4 B 2 V2 PRO2 PEP2 H5N2 14
#> # βΉ 14 more rows
Pro tip: These tibbles can get really long (think novel-length), especially with all that rich metadata. Smart analysts filter their experiments first:
toy_exp |>
filter_var(glycan_composition == "H5N2") |>
select_obs(group) |>
select_var(-glycan_composition) |>
as_tibble()
#> # A tibble: 12 Γ 6
#> sample group variable protein peptide value
#> <chr> <chr> <chr> <chr> <chr> <int>
#> 1 S1 A V1 PRO1 PEP1 1
#> 2 S2 A V1 PRO1 PEP1 5
#> 3 S3 A V1 PRO1 PEP1 9
#> 4 S4 B V1 PRO1 PEP1 13
#> 5 S5 B V1 PRO1 PEP1 17
#> 6 S6 B V1 PRO1 PEP1 21
#> 7 S1 A V2 PRO2 PEP2 2
#> 8 S2 A V2 PRO2 PEP2 6
#> 9 S3 A V2 PRO2 PEP2 10
#> 10 S4 B V2 PRO2 PEP2 14
#> 11 S5 B V2 PRO2 PEP2 18
#> 12 S6 B V2 PRO2 PEP2 22
Much more manageable, right?
Standing on the Shoulders of Giants
Designing experiment()
wasnβt done in a vacuum - we
learned from some amazing predecessors:
SummarizedExperiment π
The granddaddy of omics data containers from Bioconductor.
Solid as a rock for RNA-seq, but not quite βtidyβ enough for our
taste.
tidySummarizedExperiment π§Ή
A brilliant attempt to bring tidy principles to SummarizedExperiment
from the tidySummarizedExperiment
package. We love the concept, but felt that cramming everything into one
tibble doesnβt quite capture the mental model of separated data
types.
massdataset π¬
Our closest cousin! The massdataset package
gets so many things right - tidy operations, clean data separation,
perfect for mass spec data. We especially admire its data processing
history tracking (reproducibility FTW!).
But hereβs our twist: while object-oriented programming has its merits, we believe most R users think functionally. Your code is your reproducibility trail - elegant, transparent, and familiar to every R user.
Our Philosophy π
We chose the functional programming path because it feels like home to R
users. No hidden states, no mysterious transformations - just clear,
chainable functions that do exactly what they say on the tin.
Huge thanks to all the developers who paved this road.
glyexp
exists because of your groundbreaking work!
π
Whatβs Next?
Now you have the basic understanding of experiment()
.
Next, you can learn how to use experiment()
in your
analysis.