This document shows how to use some functions included in the package to document and reproduce a clustering workflow from the app. Although these functions do not cover all the steps such as selecting features, they allow users of the Shiny app to show the resulting heatmaps and boxplots. The example dataset used here is the
library(visxhclust) library(dplyr) library(ggplot2)
First we split the numeric and categorical variables and scale the data.
<- iris %>% select(where(is.numeric)) numeric_data <- iris %>% select(where(is.factor))annotation_data
Let’s check the dataset for highly correlated variables that will likely skew the clusters with redundant information:
As seen above, petal length and width are highly correlated, so we keep only one of them:
<- numeric_data %>% select(Sepal.Length, Sepal.Width, Petal.Width)subset_data
The clustering itself takes three steps: computing a distance matrix, computing the hierarchical clusters and cutting the tree to find the desired number of clusters. In the app, each of these steps has matching parameters: apply scaling and distance/similarity metric, linkage method and the number of clusters.
<- TRUE scaling <- "euclidean" distance_method <- "ward.D2" linkage_method # this assumes that, in the app, we identified 3 as the optimal number of clusters <- 3 k
These parameters are used in three functions that the app also uses:
cut_clusters. You can check the documentation for each function in the package website, or interactively through
<- compute_dmat(subset_data, distance_method, TRUE) dmat <- compute_clusters(dmat, linkage_method) clusters <- cut_clusters(clusters, k)cluster_labels
Now we can check both the heatmap+dendrogram and boxplots. A function that covers most steps to produce the heatmap is included in the package, with the name:
cluster_heatmaps(). It plots the dendrogram, the annotation layer, the clustered data heatmap and the heatmap with the rest of the data not used for clustering. In the Shiny app this is done automatically, but outside, plotting the annotation and the unselected data are optional steps; the annotations require an extra step with the function
create_annotations(). The colors used in the app are also exported by the package as the variable
<- create_annotations(iris, "Species") species_annotation cluster_heatmaps(scale(subset_data), clusters, k, cluster_colors,annotation = species_annotation)
In addition to the heatmap, the boxplots in the app are also available through functions. There are two steps required to show data through box plots: annotating the original data with the cluster and plotting it.
<- annotate_clusters(subset_data, cluster_labels, TRUE) annotated_data cluster_boxplots(annotated_data)