
Hierarchical Clustering for Glycomics and Glycoproteomics Data
gly_hclust.Rd
Perform hierarchical clustering on the expression data.
The function uses stats::hclust()
to perform clustering and provides
tidy results including cluster assignments, dendrogram data for plotting,
and merge heights.
Arguments
- exp
A
glyexp::experiment()
object containing expression matrix and sample information.- on
A character string specifying what to cluster. Either "variable" (default) to cluster variables/features, or "sample" to cluster samples/observations.
- k_values
A numeric vector specifying the number of clusters to cut the tree into. Default is c(2, 3, 4, 5). If NULL, no cluster assignments are returned.
- scale
A logical indicating whether to scale the data before clustering. Default is TRUE.
- add_info
A logical value. If TRUE (default), sample information from the experiment will be added to the result tibbles. If FALSE, only the clustering results are returned. Only applicable to
gly_hclust()
.- ...
Additional arguments passed to
stats::dist()
andstats::hclust()
. Note: if both functions need amethod
parameter, usedist.method
for distance andhclust.method
for clustering method.- expr_mat
A numeric matrix with variables as rows and samples as columns.
Value
A list containing:
tidy_result
: A list of tibbles with clustering results:clusters
: Cluster assignments containing the following columns:variable
orsample
: Variable or sample name (depending onon
parameter)cluster_k2
,cluster_k3
, etc.: Cluster assignments for different k values
dendrogram
: Dendrogram segment data for plotting (if ggdendro is available) containing:x
,y
,xend
,yend
: Segment coordinates for plotting
heights
: Merge heights and steps containing the following columns:merge_step
: Step number in the clustering processheight
: Height at which clusters are mergedn_clusters
: Number of clusters remaining after this merge
labels
: Labels and their positions (if ggdendro is available) containing:x
,y
: Position coordinateslabel
: Label text
raw_result
: The raw hclust object fromstats::hclust()
Details
The function performs log2 transformation on the expression data (log2(x + 1)) before
clustering. When on = "variable"
(default), variables are clustered based on their
expression patterns across samples. When on = "sample"
, samples are clustered based
on their expression profiles across variables.
gly_hclust()
is the top-level API that works with glyexp::experiment()
objects and supports
the add_info
parameter for joining experiment metadata.
gly_hclust_()
is the underlying API that works with matrices directly,
providing more flexibility for users who don't use the glyexp package.
Distance Calculation:
Distance is calculated using stats::dist()
with the specified method.
Clustering Method:
Hierarchical clustering is performed using stats::hclust()
with the specified method.
Cluster Assignment:
The dendrogram is cut at different heights to produce cluster assignments for
the specified k values using stats::cutree()
.
Required packages
This function only uses base R packages and does not require additional dependencies.
For enhanced dendrogram plotting capabilities, the ggdendro
package is recommended
but not required.