
Map Functions Over Glycan Structure Vectors
smap.Rd
These functions apply a function to each unique structure in a glycan structure vector, taking advantage of hash-based deduplication to avoid redundant computation. Similar to purrr mapping functions, but optimized for glycan structure vectors.
Usage
smap(.x, .f, ..., .parallel = FALSE)
smap_vec(.x, .f, ..., .ptype = NULL, .parallel = FALSE)
smap_lgl(.x, .f, ..., .parallel = FALSE)
smap_int(.x, .f, ..., .parallel = FALSE)
smap_dbl(.x, .f, ..., .parallel = FALSE)
smap_chr(.x, .f, ..., .parallel = FALSE)
smap_structure(.x, .f, ..., .parallel = FALSE)
Arguments
- .x
A glycan structure vector (glyrepr_structure).
- .f
A function that takes an igraph object and returns a result. Can be a function, purrr-style lambda (
~ .x$attr
), or a character string naming a function.- ...
Additional arguments passed to
.f
.- .parallel
Logical; whether to use parallel processing. If
FALSE
(default), parallel processing is disabled. Set toTRUE
to enable parallel processing.- .ptype
A prototype for the return type (for
smap_vec
).
Value
smap()
: A listsmap_vec()
: An atomic vector of type specified by.ptype
smap_lgl/int/dbl/chr()
: Atomic vectors of the corresponding typesmap_structure()
: A new glyrepr_structure object
Details
These functions only compute .f
once for each unique structure, then map
the results back to the original vector positions. This is much more efficient
than applying .f
to each element individually when there are duplicate structures.
Return Types:
smap()
: Returns a list with the same length as.x
smap_vec()
: Returns an atomic vector with the same length as.x
smap_lgl()
: Returns a logical vectorsmap_int()
: Returns an integer vectorsmap_dbl()
: Returns a double vectorsmap_chr()
: Returns a character vectorsmap_structure()
: Returns a new glycan structure vector (.f
must return igraph objects)
Examples
# Create a structure vector with duplicates
core1 <- o_glycan_core_1()
core2 <- n_glycan_core()
structures <- glycan_structure(core1, core2, core1) # core1 appears twice
# Map a function that counts vertices - only computed twice, not three times
smap_int(structures, igraph::vcount)
#> [1] 2 5 2
# Map a function that returns logical
smap_lgl(structures, function(g) igraph::vcount(g) > 5)
#> [1] FALSE FALSE FALSE
# Use purrr-style lambda functions
smap_int(structures, ~ igraph::vcount(.x))
#> [1] 2 5 2
smap_lgl(structures, ~ igraph::vcount(.x) > 5)
#> [1] FALSE FALSE FALSE
# Map a function that modifies structure (must return igraph)
add_vertex_names <- function(g) {
if (!("name" %in% igraph::vertex_attr_names(g))) {
igraph::set_vertex_attr(g, "name", value = paste0("v", seq_len(igraph::vcount(g))))
} else {
g
}
}
smap_structure(structures, add_vertex_names)
#> <glycan_structure[3]>
#> [1] Gal(b1-3)GalNAc(a1-
#> [2] Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(?1-
#> [3] Gal(b1-3)GalNAc(a1-
#> # Unique structures: 2
if (FALSE) { # \dontrun{
# Parallel processing examples
# Example 1: Manual control of parallel processing
# For small datasets, you can still force parallel processing
# Force sequential processing (default for small datasets)
result_seq <- smap_int(structures, igraph::vcount, .parallel = FALSE)
# Force parallel processing (requires parallel backend setup)
# First set up a parallel backend:
# future::plan(future::multisession, workers = 4)
result_par <- smap_int(structures, igraph::vcount, .parallel = TRUE)
# Example 2: Manual parallel processing for large datasets
# Parallel processing is disabled by default - you need to explicitly enable it
# Simulate a large dataset (in practice, this might come from parsing
# a large glycomics dataset with many different structures)
# Note: This is just for demonstration - in real use, you'd have
# genuinely different structures from your data
# For large datasets, you can enable parallel processing manually:
# large_structures <- your_large_glycan_dataset # Many unique structures
# result_parallel <- smap_int(large_structures, igraph::vcount, .parallel = TRUE)
# Example 3: Complex computation that benefits from parallelization
complex_analysis <- function(g) {
# Simulate expensive computation (e.g., complex graph analysis)
Sys.sleep(0.01) # 10ms delay per unique structure
list(
vcount = igraph::vcount(g),
ecount = igraph::ecount(g),
diameter = igraph::diameter(g),
transitivity = igraph::transitivity(g)
)
}
# Even with few unique structures, complex computations benefit from parallel
# Set up parallel backend first: future::plan(future::multisession, workers = 4)
system.time({
results_par <- smap(structures, complex_analysis, .parallel = TRUE)
})
# Compare with sequential processing
system.time({
results_seq <- smap(structures, complex_analysis, .parallel = FALSE)
})
# Example 4: Practical use case with real data
# When processing large glycomics datasets:
# glycan_data <- parse_glycan_file("large_dataset.txt") # Many structures
#
# # Compute molecular properties - manually enable parallel if needed
# molecular_weights <- smap_dbl(glycan_data, calculate_molecular_weight, .parallel = TRUE)
#
# # Force parallel for expensive computations even with fewer structures
# complexity_scores <- smap_dbl(glycan_data, expensive_complexity_analysis,
# .parallel = TRUE)
} # }