Clustering is a fundamental step in the analysis of biological and omics data. It is used to construct groups of objects (genes, proteins) with related function, expression patterns, or known to interact together. In microarrays or RNA-seq experiments, gene clustering is often associated with heatmap representation for data visualization.
Choosing the right clustering tool for your analysis
Many clustering methods and algorithms have been developed and are classified into partitioning (k-means), hierarchical (connectivity-based), density-based, model-based and graph-based approaches.
To help you choose between all the existing clustering tools, we asked OMICtools members to vote for their favorite software. Here are the top 3 tools, chosen by 23 voters.
First place for ClustEval
ClustEval is a web-based clustering analysis platform developed at the Max Planck Institute for Informatics and the University of Southern Denmark. It is designed to objectively compare the performance of various clustering methods from different datasets.
More precisely, ClustEval has compared the performances of 18 different clustering methods among the most used, using 24 different datasets. These datasets include gene expression data, protein sequence similarity, protein structure similarity, social network, word sense disambiguation, etc. The performance of a clustering method is then evaluated by a F1-score (harmonic mean of precision and recall).
Second position for Babelomics
Babelomics is a web application developed by the Computational Genomics Department of the Principe Felipe Research Center in Valencia. It performs a wide range of functional analysis of gene expression and genomic data, from processing to expression analysis and gene set enrichment.
In its current version, Babelomics 5, the web-site displays a user-friendly and intuitive interface for the clustering of microarray or RNA-seq data using one of three different methods: UPGMA, SOTA, and k-means. The subsequent result can be visualized as a heatmap. Examples of data set and analysis are provided for every functionality of the application, and tutorials available here.
Third place for AltAnalyze
AltAnalyze is a comprehensive application for the analysis of single-cell and bulk RNA-seq data that can automatize every step of gene expression and splicing analysis, including clustering and heatmap representation. It was developed in the Nathan Salomonis laboratory at Cincinnati Children’s Hosptial Medical Center and the University of Cincinnati.
AltAnalyze proposes many options for clustering algorithms and normalization, as well as unique features such as finding optimized clusters for single-cell analysis.
AltAnalyse can be downloaded and run on all operating systems, and comes with useful documentation (tutorials, blog, FAQ).
(Wiwie et al., 2015) Comparing the performance of biomedical clustering methods. Nature Methods.
(Alonso et al., 2015) Babelomics 5.0: functional interpretation for new generations of genomic data. Nucleic Acids Research.
(Emig et al., 2010) AltAnalyze and DomainGraph: analyzing and visualizing exon expression data. Nucleic Acids Research.
(Olson et al., 2016) Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature.