Custom Blueprint

library(glysmith)

A blueprint is simply an analytical plan that tells forge_analysis() what to do. By default, forge_analysis() runs a default blueprint:

blueprint_default()
#> 
#> ── Blueprint (13 steps) ──
#> 
#> • step_ident_overview()
#> • step_preprocess()
#> • step_plot_qc(when = "post")
#> • step_pca()
#> • step_dea_limma()
#> • step_volcano()
#> • step_heatmap(on = "sig_exp")
#> • step_sig_enrich_go()
#> • step_sig_enrich_kegg()
#> • step_sig_enrich_reactome()
#> • step_derive_traits()
#> • step_dea_limma(on = "trait_exp")
#> • step_heatmap(on = "sig_trait_exp")

This vignette shows you how to build your own blueprint for custom analyses.

Heads up: Before diving into the details of manual blueprint creation, you might want to try having an LLM generate one for you with inquire_blueprint(). Check out this vignette for the details.

Step functions

glysmith comes with various step functions (anything starting with step_), each handling a specific analytical task. Chain them together with blueprint():

bp <- blueprint(
  step_ident_overview(),  # Identification overview
  step_preprocess(),      # Preprocess (normalization, imputation, etc.)
  step_dea_limma(),       # DEA with limma
  step_volcano(),         # Volcano plot
)

Now bp is ready to go — just pass it as the second argument to forge_analysis():

res <- forge_analysis(exp, bp)

Here’s the full lineup of available step functions:

Function	Description	Require Glycan Structure	Experiment Type
Preprocessing
`step_ident_overview()`	Summarize the experiment (identification overview)	No	GP/G
`step_plot_qc()`	Generate quality control plots	No	GP/G
`step_preprocess()`	Preprocess the experiment (normalization, imputation, etc.)	No	GP/G
`step_subset_groups()`	Subset the experiment to specific groups	No	GP/G
`step_adjust_protein()`	Adjust glycoform quantification by protein abundance	No	GP
Differential Analysis
`step_dea_limma()`	Differential expression analysis using limma	No	GP/G
`step_dea_ttest()`	Differential expression analysis using t-test	No	GP/G
`step_dea_anova()`	Differential expression analysis using ANOVA	No	GP/G
`step_dea_wilcox()`	Differential expression analysis using Wilcoxon test	No	GP/G
`step_dea_kruskal()`	Differential expression analysis using Kruskal-Wallis test	No	GP/G
Visualization
`step_volcano()`	Create volcano plot from DEA results	No	GP/G
`step_heatmap()`	Create heatmap plot	No	GP/G
`step_logo()`	Create logo plot for glycosylation sites	No	GP
`step_sig_boxplot()`	Boxplots for significant variables from DEA	No	GP/G
Dimension Reduction
`step_pca()`	Principal Component Analysis	No	GP/G
`step_tsne()`	t-SNE analysis	No	GP/G
`step_umap()`	UMAP analysis	No	GP/G
`step_plsda()`	Partial Least Squares Discriminant Analysis	No	GP/G
`step_oplsda()`	Orthogonal Partial Least Squares Discriminant Analysis	No	GP/G
Correlation & Enrichment
`step_correlation()`	Pairwise correlation analysis	No	GP/G
`step_sig_enrich_go()`	GO enrichment analysis on DE variables	No	GP
`step_sig_enrich_kegg()`	KEGG enrichment analysis on DE variables	No	GP
`step_sig_enrich_reactome()`	Reactome enrichment analysis on DE variables	No	GP
`step_sig_enrich_ncg()`	NCG enrichment analysis on DE variables	No	GP
`step_sig_enrich_wp()`	WikiPathways enrichment analysis on DE variables	No	GP
`step_sig_enrich_do()`	Disease Ontology enrichment analysis on DE variables	No	GP
Advanced Analysis
`step_infer_structure()`	Infer glycan structures from glycan compositions	No	GP/G
`step_derive_traits()`	Calculate glycan derived traits	Yes	GP/G
`step_quantify_dynamic_motifs()`	Quantify dynamic glycan motifs	Yes	GP/G
`step_quantify_branch_motifs()`	Quantify N-glycan branch motifs	Yes	GP/G
`step_roc()`	ROC analysis for biomarker discovery	No	GP/G
Survival Analysis
`step_cox()`	Cox proportional hazards model	No	GP/G

GP: glycoproteomics, G: glycomics

Step dependencies

Some steps need data that other steps produce. For instance, step_volcano() needs results from a DEA step like step_dea_limma(). So this blueprint won’t work:

blueprint(  # this will error
  step_preprocess(),      # Preprocess (normalization, imputation, etc.)
  step_volcano(),         # Volcano plot
)

The easiest way to figure out dependencies is to check the documentation for each step. For example, ?step_volcano() tells you it “requires one of the DEA steps to be run: step_dea_limma(), step_dea_ttest(), step_dea_wilcox()”. So as long as you run one of these DEA steps before step_volcano(), you’re good to go:

bp <- blueprint(
  step_preprocess(),      # Preprocess (normalization, imputation, etc.)
  step_dea_ttest(),       # DEA with t-test
  step_volcano(),         # Volcano plot
)

Steps and their dependencies don’t need to be right next to each other — all intermediate data gets stored along the way, so the blueprint is valid as long as the required data exists somewhere upstream:

bp <- blueprint(
  step_preprocess(),      # Preprocess (normalization, imputation, etc.)
  step_dea_ttest(),       # DEA with t-test (generates data required by `step_volcano()`)
  step_heatmap(),         # Heatmap
  step_volcano(),         # Volcano plot (requires data generated by `step_dea_ttest()`)
)

The `on` parameter

Many step functions have an on parameter that controls their dependencies. For example, step_heatmap() defaults to on = "exp", which plots a heatmap from the experiment (after preprocessing, if you did that). But you can point it at other data like “sig_exp” instead — just remember you’ll need to run a DEA step first:

bp <- blueprint(
  step_preprocess(),             # Preprocess (normalization, imputation, etc.)
  step_dea_limma(),              # DEA with limma
  step_heatmap(on = "sig_exp"),  # Heatmap on significantly expressed variables
)

Check ?step_dea_limma() and you’ll see it generates a field called exactly “sig_exp”. This is how data flows in glysmith: each step stores its results in a shared context object, where downstream steps can grab what they need.

Let’s see another example:

bp <- blueprint(
  # Data required: `exp`
  # Data generated: `exp` (overwrite)
  step_preprocess(),

  # Data required: `exp`
  # Data generated: `trait_exp` (trait experiment)
  step_derive_traits(),

  # Data required: `trait_exp` (specified by `on`)
  # Data generated: `sig_trait_exp` (significantly different trait experiment)
  step_dea_limma(on = "trait_exp"),

  # Data required: `sig_trait_exp`
  step_heatmap(on = "sig_trait_exp"),
)

Here’s the general rule for dataflow in glysmith: steps don’t overwrite data generated by earlier steps. The exception is step_preprocess(), which updates exp with the preprocessed version (but don’t worry — the raw data is safely backed up in raw_exp).

To build valid blueprints, you’ll need to know what each step needs and what it produces. Let’s look at a more complex example to cement this. Try tracing through the dataflow yourself:

bp <- blueprint(
  step_ident_overview(),
  step_preprocess(),
  step_pca(),
  
  # DEA
  step_dea_limma(),
  step_volcano(),
  step_heatmap(on = "sig_exp"),

  # Derived trait analysis
  step_derive_traits(),
  step_dea_limma(on = "trait_exp"),
  step_heatmap(on = "sig_trait_exp"),

  # Branch motif analysis
  step_quantify_branch_motifs(),
  step_dea_limma(on = "branch_motif_exp"),
  step_heatmap(on = "sig_branch_motif_exp")
)

Branches

As we mentioned, steps can’t overwrite data from earlier steps. But what if you want to compare different DEA methods? You might try something like this:

blueprint(
  step_preprocess(),
  step_dea_limma(),
  step_volcano(),
  step_dea_ttest(),
  step_volcano(),
)

This won’t fly for two reasons:

step_dea_ttest() would overwrite the sig_exp from step_dea_limma().
step_volcano() shows up twice, which isn’t allowed.

These rules keep table and plot names unambiguous.

To pull this off, use branches. So far our blueprints have been linear — steps run one after another. To branch out, just wrap related steps in br():

bp <- blueprint(
  step_preprocess(),
  br("limma", step_dea_limma(), step_volcano()),
  br("t-test", step_dea_ttest(), step_volcano()),
)

The first argument names the branch, and the rest are the steps to run inside it. Each branch gets its own isolated context, so the step_volcano() in the “limma” branch naturally picks up the sig_exp from the step_dea_limma() in that same branch. Branches can also access data from before the branch point — so both branches here can use the preprocessed exp from step_preprocess().

Summary

In this vignette, we covered how to build custom blueprints, how the on parameter and step dependencies work, and how to use branches.

All of this takes some practice, so if you’d rather not dive deep into the details, remember you can always have AI generate blueprints for you.

Step functions

Step dependencies

The on parameter

Branches

Summary

The `on` parameter