Skip to contents

A blueprint is simply an analytical plan that tells forge_analysis() what to do. By default, forge_analysis() runs a default blueprint:

blueprint_default()
#> 
#> ── Blueprint (13 steps) ──
#> 
#> • step_ident_overview()
#> • step_preprocess()
#> • step_plot_qc(when = "post")
#> • step_pca()
#> • step_dea_limma()
#> • step_volcano()
#> • step_heatmap(on = "sig_exp")
#> • step_sig_enrich_go()
#> • step_sig_enrich_kegg()
#> • step_sig_enrich_reactome()
#> • step_derive_traits()
#> • step_dea_limma(on = "trait_exp")
#> • step_heatmap(on = "sig_trait_exp")

This vignette shows you how to build your own blueprint for custom analyses.

Heads up: Before diving into the details of manual blueprint creation, you might want to try having an LLM generate one for you with inquire_blueprint(). Check out this vignette for the details.

Step functions

glysmith comes with various step functions (anything starting with step_), each handling a specific analytical task. Chain them together with blueprint():

bp <- blueprint(
  step_ident_overview(),  # Identification overview
  step_preprocess(),      # Preprocess (normalization, imputation, etc.)
  step_dea_limma(),       # DEA with limma
  step_volcano(),         # Volcano plot
)

Now bp is ready to go — just pass it as the second argument to forge_analysis():

res <- forge_analysis(exp, bp)

Here’s the full lineup of available step functions:

Function Description Require Glycan Structure Experiment Type
Preprocessing
step_ident_overview() Summarize the experiment (identification overview) No GP/G
step_plot_qc() Generate quality control plots No GP/G
step_preprocess() Preprocess the experiment (normalization, imputation, etc.) No GP/G
step_subset_groups() Subset the experiment to specific groups No GP/G
step_adjust_protein() Adjust glycoform quantification by protein abundance No GP
Differential Analysis
step_dea_limma() Differential expression analysis using limma No GP/G
step_dea_ttest() Differential expression analysis using t-test No GP/G
step_dea_anova() Differential expression analysis using ANOVA No GP/G
step_dea_wilcox() Differential expression analysis using Wilcoxon test No GP/G
step_dea_kruskal() Differential expression analysis using Kruskal-Wallis test No GP/G
Visualization
step_volcano() Create volcano plot from DEA results No GP/G
step_heatmap() Create heatmap plot No GP/G
step_logo() Create logo plot for glycosylation sites No GP
step_sig_boxplot() Boxplots for significant variables from DEA No GP/G
Dimension Reduction
step_pca() Principal Component Analysis No GP/G
step_tsne() t-SNE analysis No GP/G
step_umap() UMAP analysis No GP/G
step_plsda() Partial Least Squares Discriminant Analysis No GP/G
step_oplsda() Orthogonal Partial Least Squares Discriminant Analysis No GP/G
Correlation & Enrichment
step_correlation() Pairwise correlation analysis No GP/G
step_sig_enrich_go() GO enrichment analysis on DE variables No GP
step_sig_enrich_kegg() KEGG enrichment analysis on DE variables No GP
step_sig_enrich_reactome() Reactome enrichment analysis on DE variables No GP
Advanced Analysis
step_derive_traits() Calculate glycan derived traits Yes GP/G
step_quantify_dynamic_motifs() Quantify dynamic glycan motifs Yes GP/G
step_quantify_branch_motifs() Quantify N-glycan branch motifs Yes GP/G
step_roc() ROC analysis for biomarker discovery No GP/G
Survival Analysis
step_cox() Cox proportional hazards model No GP/G

GP: glycoproteomics, G: glycomics

Step dependencies

Some steps need data that other steps produce. For instance, step_volcano() needs results from a DEA step like step_dea_limma(). So this blueprint won’t work:

blueprint(  # this will error
  step_preprocess(),      # Preprocess (normalization, imputation, etc.)
  step_volcano(),         # Volcano plot
)

The easiest way to figure out dependencies is to check the documentation for each step. For example, ?step_volcano() tells you it “requires one of the DEA steps to be run: step_dea_limma(), step_dea_ttest(), step_dea_wilcox()”. So as long as you run one of these DEA steps before step_volcano(), you’re good to go:

bp <- blueprint(
  step_preprocess(),      # Preprocess (normalization, imputation, etc.)
  step_dea_ttest(),       # DEA with t-test
  step_volcano(),         # Volcano plot
)

Steps and their dependencies don’t need to be right next to each other — all intermediate data gets stored along the way, so the blueprint is valid as long as the required data exists somewhere upstream:

bp <- blueprint(
  step_preprocess(),      # Preprocess (normalization, imputation, etc.)
  step_dea_ttest(),       # DEA with t-test (generates data required by `step_volcano()`)
  step_heatmap(),         # Heatmap
  step_volcano(),         # Volcano plot (requires data generated by `step_dea_ttest()`)
)

The on parameter

Many step functions have an on parameter that controls their dependencies. For example, step_heatmap() defaults to on = "exp", which plots a heatmap from the experiment (after preprocessing, if you did that). But you can point it at other data like “sig_exp” instead — just remember you’ll need to run a DEA step first:

bp <- blueprint(
  step_preprocess(),             # Preprocess (normalization, imputation, etc.)
  step_dea_limma(),              # DEA with limma
  step_heatmap(on = "sig_exp"),  # Heatmap on significantly expressed variables
)

Check ?step_dea_limma() and you’ll see it generates a field called exactly “sig_exp”. This is how data flows in glysmith: each step stores its results in a shared context object, where downstream steps can grab what they need.

Let’s see another example:

bp <- blueprint(
  # Data required: `exp`
  # Data generated: `exp` (overwrite)
  step_preprocess(),

  # Data required: `exp`
  # Data generated: `trait_exp` (trait experiment)
  step_derive_traits(),

  # Data required: `trait_exp` (specified by `on`)
  # Data generated: `sig_trait_exp` (significantly different trait experiment)
  step_dea_limma(on = "trait_exp"),

  # Data required: `sig_trait_exp`
  step_heatmap(on = "sig_trait_exp"),
)

Here’s the general rule for dataflow in glysmith: steps don’t overwrite data generated by earlier steps. The exception is step_preprocess(), which updates exp with the preprocessed version (but don’t worry — the raw data is safely backed up in raw_exp).

To build valid blueprints, you’ll need to know what each step needs and what it produces. Let’s look at a more complex example to cement this. Try tracing through the dataflow yourself:

bp <- blueprint(
  step_ident_overview(),
  step_preprocess(),
  step_pca(),
  
  # DEA
  step_dea_limma(),
  step_volcano(),
  step_heatmap(on = "sig_exp"),

  # Derived trait analysis
  step_derive_traits(),
  step_dea_limma(on = "trait_exp"),
  step_heatmap(on = "sig_trait_exp"),

  # Branch motif analysis
  step_quantify_branch_motifs(),
  step_dea_limma(on = "branch_motif_exp"),
  step_heatmap(on = "sig_branch_motif_exp")
)

Branches

As we mentioned, steps can’t overwrite data from earlier steps. But what if you want to compare different DEA methods? You might try something like this:

This won’t fly for two reasons:

  1. step_dea_ttest() would overwrite the sig_exp from step_dea_limma().
  2. step_volcano() shows up twice, which isn’t allowed.

These rules keep table and plot names unambiguous.

To pull this off, use branches. So far our blueprints have been linear — steps run one after another. To branch out, just wrap related steps in br():

The first argument names the branch, and the rest are the steps to run inside it. Each branch gets its own isolated context, so the step_volcano() in the “limma” branch naturally picks up the sig_exp from the step_dea_limma() in that same branch. Branches can also access data from before the branch point — so both branches here can use the preprocessed exp from step_preprocess().

Summary

In this vignette, we covered how to build custom blueprints, how the on parameter and step dependencies work, and how to use branches.

All of this takes some practice, so if you’d rather not dive deep into the details, remember you can always have AI generate blueprints for you.