
Get Started with glysmith
glysmith.RmdGlycoverse is an incredibly comprehensive ecosystem. On one hand, this means you can flexibly perform a wide range of analyses by combining its different components. On the other hand, getting started can require learning quite a lot.
glysmith is the highest-level package in the glycoverse
suite. It encapsulates the most common analysis workflows into a unified
interface, allowing you to execute the entire analytical pipeline in
just one function call.
If you’ve seen The Lord of the Rings, you can think of
glysmith as the “One Ring to rule them all.”
Note: glysmith is not part of the core glycoverse
packages. You need to install it separately, even if you have installed
the meta package glycoverse.
Analyze your data with a single line of code
We will demonstrate using the built-in experiment object from
glyexp. This is an N-glycoproteomics dataset containing 12
samples across four different liver conditions: healthy controls (H),
hepatitis (M), cirrhosis (Y), and hepatocellular carcinoma (C), with 3
samples for each group. Please note, the dataset is raw and has not been
normalized or imputed.
To work with your own data, you can use glyread to
import outputs from tools like pGlyco3, MSFragger-Glyco, or Byonic;
Alternatively, you can create your own experiment object. For more
details, see Get
Started with glyread or Creating
Experiments.
real_experiment
#>
#> ── Glycoproteomics Experiment ──────────────────────────────────────────────────
#> ℹ Expression matrix: 12 samples, 4262 variables
#> ℹ Sample information fields: group <fct>
#> ℹ Variable information fields: peptide <chr>, peptide_site <int>, protein <chr>, protein_site <int>, gene <chr>, glycan_composition <glyrpr_c>, glycan_structure <glyrpr_s>Now, let’s run the default analysis pipeline with just a single line of code.
result <- forge_analysis(real_experiment)
resultThat’s it! You’ve completed the following steps in one go:
- Preprocessed your data using
glyclean::auto_clean()and generated QC plots. - Summarized the experiment with
glyexp::summarize_experiment(). - Performed principal component analysis with
glystats::gly_pca(). - Conducted differential expression analysis using
glystats::gly_limma(). - Plotted a volcano plot via
glyvis::plot_volcano(). - Generated a heatmap for significant glycoforms using
glyvis::plot_heatmap(). - Conducted GO enrichment analysis using
glystats::gly_enrich_go(). - Performed KEGG enrichment analysis using
glystats::gly_enrich_kegg(). - Performed Reactome enrichment analysis using
glystats::gly_enrich_reactome(). - Derived site-specific traits using
glydet::derive_traits(). - Conducted differential trait analysis with
glystats::gly_limma(). - Generated a heatmap of significant site-specific derived traits via
glyvis::plot_heatmap().
What a comprehensive analysis pipeline!
Save all results with one line of code
Once you have your results, what’s next? You can export all tables and plots to a target directory in just a single line of code.
quench_result(result, "path/to/save")Running this will create the folder path/to/save with
the following content:
-
README.md: Describes all the saved outputs. -
experiment.rds: The processed experiment object. -
meta.rds: The metadata list. -
plots/: All plots generated from the pipeline. -
tables/: All tables output from the analysis.
Give it a try yourself!
One line to generate a report
We’re developing advanced report generation features, but for now,
you can use polish_report() to generate a report directly
from the results.
polish_report(result, "path/to/save/report.html")By default, the report will open automatically in your browser when it’s finished.
About naming
You may have noticed some unusual names in this package. First, the
‘smith’ in glysmith doesn’t refer to a surname; it’s taken
from “blacksmith”—someone who makes and repairs items made of iron.
All functions and classes are named using blacksmith jargon. For
example, forge_analysis() for “forging” results,
quench_result() for “quenching” the results to save them,
and polish_report() for “polishing” the report to improve
readability.
You’ll learn about “blueprint” in the next section.
Blueprints
Imagine you visit a blacksmith to order a sword. If you simply say, “I want a sword,” the blacksmith will use a default blueprint and start working on it. However, you always have the option to bring your own design—a custom blueprint—that the smith can use for your sword.
Similarly, forge_analysis() uses a default blueprint for
the analysis pipeline. You can inspect it via
blueprint_default().
blueprint_default()
#>
#> ── Blueprint (13 steps) ──
#>
#> • step_ident_overview()
#> • step_preprocess()
#> • step_plot_qc(when = "post")
#> • step_pca()
#> • step_dea_limma()
#> • step_volcano()
#> • step_heatmap(on = "sig_exp")
#> • step_sig_enrich_go()
#> • step_sig_enrich_kegg()
#> • step_sig_enrich_reactome()
#> • step_derive_traits()
#> • step_dea_limma(on = "trait_exp")
#> • step_heatmap(on = "sig_trait_exp")A blueprint is a list of steps executed sequentially. You can use
blueprint() with step_ functions to create
your own custom blueprint. For instance, if you’d just like to perform
differential expression analysis and plot the volcano plot, you could
do:
bp <- blueprint(
step_preprocess(),
step_dea_limma(),
step_volcano(),
)
result <- forge_analysis(real_experiment, bp)
resultYou can also save a blueprint to a file and reload it later:
write_blueprint(bp, "path/to/save/bp.rds")
bp <- read_blueprint("path/to/save/bp.rds")Here is a list of all available steps:
-
step_ident_overview(): Summarize the experiment identification. -
step_preprocess(): Preprocess the experiment data and generate QC plots. -
step_pca(),step_tsne(),step_umap(): Dimension reduction for data exploration. -
step_dea_limma(),step_dea_ttest(),step_dea_anova(),step_dea_wilcox(),step_dea_kruskal(): Differential expression analysis with various methods. -
step_volcano(): Create volcano plots for DEA results. -
step_heatmap(): Generate heatmaps for expression data. -
step_roc(): Perform Receiver Operating Characteristic (ROC) analysis. -
step_sig_enrich_go(),step_sig_enrich_kegg(),step_sig_enrich_reactome(): Functional enrichment analysis for significant items. -
step_derive_traits(): Calculate glycan derived traits. -
step_quantify_motifs(): Quantify glycan motifs (substructures).
Before you start building your own blueprints, here are a few rules to keep in mind:
-
step_preprocess()is generally the first step, but it’s optional if you perform preprocessing manually withglyclean. - Some steps require others to be executed first. For example,
step_volcano()requires at least one ofstep_dea_limma(),step_dea_ttest(), orstep_dea_wilcox()to be run beforehand. Consult the documentation for each step to check dependencies. - You cannot include duplicate steps within a blueprint. For instance,
you can’t add
step_dea_limma()twice in the same blueprint. - If a step overwrites data produced by an earlier step, a warning
will be issued. For example, including both
step_dea_limma()and thenstep_dea_ttest()will overwrite thedea_resdata from the former with that from the latter step. Ignore the warning if that’s what you intend. - You can create branches in a blueprint by using
br(). Check out its document for examples.
Happy forging!