Your top 3 Venn diagram tools

banner-venn-diagram

Venn diagrams are very simple, yet incredibly useful tools used to show all logical relations between finite collections of different sets of data. In Venn diagrams, sets of data are often represented as overlapping circles. Data that are shared between two different sets will reside at the intersection, while unique data remain outside the intersection.

Venn diagrams in biology

In biology and omics, Venn diagrams can be used for a variety of purposes, such as the comparison of different lists of genes or proteins (generally 2 or 3) to identify similarities and represent them in two dimensions. Most softwares allow easy extraction of the data, and let you customize the diagrams.

Continuing our series of data visualization tools, OMICtools members voted for their favorite Venn diagram representation softwares and websites. Here are the results from 54 voters.

Your top 1 Venn diagram generation tool: Venny

You were 50% to choose Venny as your number 1 favorite tool to generate Venn diagrams.

Venny is a web-server created by Juan Carlos Oliveros from the BioinfoGP service at the Spanish National Biotechnology Centre, that can be used online or offline to generate Venn diagrams from up to four lists.

Its straight-forward usage lets you create diagrams and extract data in 3 basic steps:

  1. Paste your lists of data (one element per row) and rename the lists
  2. Click on the numbers to get exclusive and common data between lists
  3. Right-click the figure to view and save the diagram

Venny allows basic customization of your diagrams (line weight, font size and style).

Venny-omictools
Example Venn diagram generated with Venny

Shared second place for BioVenn and Venn diagram

43% of the OMICtools community voted for Biovenn and Venn diagram as their favorite tool!

BioVenn

BioVenn is a web application developed at the Centre for Molecular and Biomolecular Informatics that enables creation of Venn diagrams from up to 3 sets of data. Unlike in Venny, the diagrams in BioVenn are area-proportional, which means that the size of the circles and the overlaps correspond to the sizes of the data sets. BioVenn also comes with interesting features, such as the ability to directly upload a data set from a tab file, or to support a wide range of identifiers which can be linked to biological databases.

BioVenn-omictools
Venn diagram generated with BioVenn

Venn diagram

Venn diagram was developed by the VIB-Ugen Center for Plant Systems Biology at Ghent University. This web application allows users to draw Venn diagrams from up to 6 data lists, in a symmetric or non-symmetric fashion. The diagrams can then be downloaded in SVG or PNG format. Moreover, Venn diagram is able to calculate the intersections of up to 30 different lists, making it a useful tool to identify common values between multiple data sets.

Venn-diagram-omictools
Venn diagram generated with the Venn diagram software

Bronze medal for InteractiVenn

A very close third place goes to InteractiVenn, chosen by 41% of voters as their favorite tool.

InteractiVenn is a sophisticated and flexible web-based tool that allows creation of Venn diagrams from up to 6 lists and analysis of set unions, while preserving the shape of the diagram. By displaying partial unions, the user is able to locate regions that combine unions of sets and their intersections, thus providing additional observations on the interactions between joined sets.

With InteractiVenn, the user can choose text size, color, opacity, and can export the diagram in a vectored format.  Datasets can also be saved locally for later use on the website.

Interactivenn-omictools
Venn diagram generated with InteractiVenn

References

(Oliveros, J.C., 2007-2015) Venny. An interactive tool for comparing lists with Venn’s diagrams. http://bioinfogp.cnb.csic.es/tools/venny/index.html

(Hulsen et al., 2008) BioVenn – a web application for the comparison and visualization of biological lists using area-proportional Venn diagrams. BMC Genomics.

Venn diagram: http://bioinformatics.psb.ugent.be/webtools/Venn/

(Heberle et al., 2015) InteractiVenn: a web-based tool for the analysis of sets through Venn diagrams. BMC Bioinformatics.

Your top 3 heatmap generation tools

Heatmaps are one of the most commonly used representation tools for synthesis of complex pools of data. By coding numerical values into colors, heatmaps enable quick representation of quantitative differences in expression levels of biological data.

Using heatmaps to visualize your data

Heatmaps are particularly useful for analysis of gene expression microarray data. Most heatmap representations are also combined with clustering methods to group genes and/or samples based on their expression patterns. Each gene is represented as a row and is color-coded to represent the intensity of its variation (either positive or negative) relative to a reference value. Biological samples are represented as columns in the grid.

To evaluate the OMICtools series of data visualization tools, we asked members of the OMICtools community to vote for their favorite heatmap generation tools. Here are the top 3 heatmap generation tools, selected by 63 voters.

Your number 1 tool: Heatmapper

67% of you chose Heatmapper as your number 1 favorite tool to generate heatmaps.

A composite image of various heatmap representations in the Heatmapper interface

The Heatmapper software is a versatile tool that allows you to create a wide variety of heatmaps for many different data types, such as heatmaps for transcriptomic, proteomic and metabolomic data but also pairwise distance maps, image overlay heatmaps or geopolitical heatmaps.

You can upload your data as text, Excel, or tab delimited formatted tables and export the resulting heatmaps in various formats.

Being one of the few web-based and generalist heatmap representation tools, Heatmapper is particularly suited for users with low computational power or for non-specialists. Complex sets of data from microarray, RNA-seq, proteomic or metabolomic experiments can also be displayed and clustered using one of the five distance measurement methods available, including Pearson and Spearman Rank correlation.

Your second best heatmap generation software: Gitools

48% of the OMICtools community voted for Gitools as the best tool for generating heatmaps.

Gitools is a comprehensive and interactive heatmap generation software developed over several years at the Barcelona Biomedical Genomics Lab at the Biomedical Research Park in Barcelona (PRBB).

This software enables analysis and visualization of genomic data as interactive heatmaps. After uploading your data, Gitools lets you choose between different types of analyses over matrices and modules, such as enrichment analysis, correlations and overlaps. One of its unique features, Oncodrive, is a method to identify genes which are more altered than would be expected by chance, taking into account the whole matrix. The originality of the Gitools software lies in its capacity to navigate data and results in the form of interactive heatmaps and obtain detailed information by clicking on each cell of the map.

Practical tutorials, as well as examples of data and results are provided on the Gitools main website, and explanatory videos can be found on the Barcelona Biomedical Genomics Lab Youtube channel. Example of an explanatory video for Gitools about sorting and stratifying heatmaps.

Third place goes to Shinyheatmap

In third position, Shinyheatmap was chosen by 35% of voters.

A Shinyheatmap interactive heatmap generated from a large dataset input

Shinyheatmap is hosted online as an R Shiny web server application and may also be run locally from within R Studio.

It is designed as a user-friendly heatmap software and has a low memory footprint, which enables interactive visualization of very large datasets. Shinyheatmap also features a built-in high performance web plug-in fastheatmap, that can compute datasets of millions of rows within seconds. As such, it is particularly suited for RNA-seq or NGS-driven studies. Shinyheatmap can generate both static and interactive heatmaps and allows the user to customize several parameters.

Our next OMICtools survey on data visualization will focus on Venn diagram tools – it’s your chance to have your say (you’ll receive a survey invite by email)!

References:

(Babicki et al., 2016) Heatmapper: Web-enable heat mapping for all. Nucleic Acids Research.

(Perez-Llamas and Lopez-Bigas, 2011) Gitools: Analysis and visualization of genomic data using interactive heat-maps. PLoS ONE.

(Khomtchouk et al., 2017) shinyheatmap: Ultrafast low memory heatmap web interface for big data genomics. PLos ONE.

Your Top 3 Circos plot generation tools

big-data-circos-plot-omictools

Making great images of your data

With the growing amount of biological data generated, innovative bioinformatics tools have been developed for modelling and synthetizing complex information in comprehensive figures. Several infographics types are now available for an informative and clear representation and analysis of your data, and which differ depending on the specific domain and question you are studying.

So how do you choose the best tools to efficiently explore your data and illustrate your scientific findings?

To help you answer this key question, we have initiated a series of surveys with users on the main categories of data visualization tools among those which are most used by the OMICtools community. The first of our survey series concerns the Circos plot generation tools.

Using Circos plots

Circos plots allow you to visualize data in a circular layout. This kind of representation is particularly useful to integrate and compare large amounts of data. Circos is one of the best infographics to show relationships between elements. The Circos plot has become a standard method for presenting genomics and epigenomics data, genome annotation and comparative genomics, offering fine visualization of sequence alignments, conservation, synteny, rearrangements, gene expression, methylation levels, and more. Circos plots can also be used to display any kind of data domains with multi-layer features and relationships.

Here are the top 3 best tools, selected by 65 of you, OMICtools members, for creating Circos plots.

The Gold medal goes to the popular Circos tool

Your #1 top tool is the well-known command-line based Circos software, with 66% of the votes.

Originally conceived for visualizing genomic data such as alignments and structural variations, Circos uses a circular ideogram layout that can display data as a scatter, line or histogram plots, heat maps, tiles, connectors, and text.

Circos-genomics-omictools

Circos has features that makes it ideal for drawing genomic information. Shown here are ChIP-Seq, chr 22 methylation, whole-genome methylation, multi-species comparison, human genome variation and self-similarity and MLL recombinome.

Circos is a free command-line application written in Perl. It can be deployed on any operating system for which Perl is available (e.g. Windows, Mac OS X, Linux and other UNIX). Circos produces bitmap (PNG) and vector (SVG) images using plain text configuration and input files. A very complete website with documentation is available with a series of 8 online tutorials presenting each specific feature of Circos, a quick guide, support through the Circos forum, as well as several examples of published images.

circos-examplefigure-omictools

For the last 10 years, this tool has helped thousands of scientists from various field to create beautiful representations of their data. Circos software has been used and referenced in more than 500 scientific publications and a larger variety of publications such as in the New York Times.

Silver medal for BioCircos.js and ggbio tools

The second place went to the BioCircos.js library and the R package ggbio, with 40% of the votes each.

Web visualization applications have the advantage of generating interactive graphs, in which all elements are interactive with mouse-over explanations and clickable buttons. This provides a more user-friendly Circos plot representation with easily accessible information.

BioCircos.js is an open source interactive JavaScript library, based on the D3 (Data-Driven Documents) and jQuery JavaScript libraries. It offers flexible plugins and powerful functionality for developers who need to build web-based applications for Circos plot generation. Biocircos.js supports multiple-platforms and works in all major internet browsers (Google Chrome is recommended). Biocircos.js version 1.1 is available (since September 2016), as well as updated documentation. Several modules are provided (SNP, CNV, HEATMAP, LINK, LINE, SCATTER, ARC, TEXT, and HISTGRAM) to display genome-wide genetic variations (SNPs, CNVs and chromosome rearrangement), gene expression and biomolecule interactions.

GGbio R package (version 1.24.1) offers the advantage of using the statistical functionality available in R as well as the grammar of graphics and the data handling capabilities of the Bioconductor project. A quick start guide and a manual were also released with Bioconductor. This tool has been mainly used to explore genome annotations and HTS data. The figures provide detailed views of genomic regions, sequence alignments and splicing patterns, and genome-wide overviews with karyogram, circular and grand linear layouts.

ggbio-figure-omictoolsGgbio application: Representation of copy number whole-genome profiles of five follicular lymphoma tumor samples generated from the Affymetrix Mapping 500K array. From Yin et al., 2012.  Genome Biology.

Bronze medal for the recent CircosVCF tool

The third place went to the web application CircosVCF with 34% of the votes.

CircosVCF is an interactive free web interface designed for vizualizing variants in genome-wide datasets. It was implemented in JavaScript and supports several browsers (Chrome, Firefox, Explorer 10+, Edge). CircosVCF provides circos visualization of input files in the standard Variant Call Format (large VCF files). It offers a very simplified user-friendly graphical interface to create Circos plots with an interactive design and the integration of additional information such as experimental data or annotations. The visualization capabilities of CircosVCF give a global overview of relationships between genomes and allow identification of SNPs regions.

Here is a demo for using CircosVCF:

Our next survey on data visualization will focus on heatmap generation tools. You are welcome to participate!

References

(Krzywinski et al., 2009) Circos: an information aesthetic for comparative genomics.  Genome Research.
(Cui et al., 2016) BioCircos.js: an interactive Circos JavaScript library for biological data visualization on web applications. Bioinformatics.
(Yin et al., 2012) ggbio: an R package for extending the grammar of graphics for genomic data.  Genome Biology.
(Drori et al., 2017) CircosVCF: circos visualization of whole-genome sequence variations stored in VCF files.  Bioinformatics.