Your top 3 RNA-seq quality control tools

Banner-RNAseq-QC-omictools

RNA-sequencing (RNA-seq) is currently the leading technology for transcriptome analysis. RNA-seq has a wide range of applications, from the study of alternative gene splicing, post-transcriptional modifications, to comparison of relative gene expression between different biological samples.

To help you prepare and analyse your RNA-seq experiments in the best conditions, we have launched a new series of surveys focused on the best tools for each fundamental step of an RNA-seq experiment.

Starting your analysis with quality control

The first step in the analysis of an RNA-seq experiment is quality control. This crucial step will ensure that your data have the best quality to perform the subsequent steps of your analysis. Quality control usually include sequence quality, sequencing depth, reads duplication rates (clonal reads), alignment quality, nucleotide composition bias, etc.

We therefore start this series by presenting you the best QC tools, chosen by the OMICtools community!

Your number 1 tool: NGS QC Toolkit

NGS QC Toolkit was the favorite tool for 79% of OMICtools members.

This standalone and open source application proposes several QC tools to quality check and filter your NGS data. The toolbox is divided in 4 major groups of tool:

  • Quality control tools for Illumina or Roche 454 data
  • Trimming tools
  • Format conversion tools
  • Statistics tools

All QC tools can generate graphs as outputs, as well as diverse statistics, such as average quality scores at each base position, GC content distribution, etc.

NGS-QC-figure-omictools
NGS QC Toolkit toolbox

The application can be downloaded here:  Link and can be run on Windows and Linus operating system, provided Activeperl is installed.

Your second favorite tool: RseqFlow

RseqFlow is a RNA-seq analysis pipeline that covers pre- and post-mapping quality control, as well as other analysis steps. The pipeline is divided in 4 branches, that can be run individually or in a workflow mode.

  • Branch 1: Quality Control and SNP calling based on the merging of alignments to the transcriptome and genome.
  • Branch 2: Expression level quantification for Gene/Exon/Splice Junctions based on alignment to the transcriptome.
  • Branch 3: Some file format conversions for easy storage, backup and visualization.
  • Branch 4: Differentially expressed gene identification based on the output of the expression level quantification from Branch 2.

RseqFlow provides a downloadable Virtual Machine (VM) image managed with Pegasus, that allows users to run the pipeline easily using different computational resources, available here: Link

RseqFlow can also be run with a unix shell mode that allows users to execute each branch of analysis with a unix command (The following software must be pre-installed: Python 2.7 or higher, R 2.11 or higher, and GCC).

Third place for Trim Galore!

Trim Galore! is a wrapper script to automate quality and adapter trimming as well as quality control, with some added functionality to remove biased methylation positions for RRBS sequence files (for directional, non-directional (or paired-end) sequencing).

It’s main features include:

  • For adapter trimming, Trim Galore! uses the first 13 bp of Illumina standard adapters (‘AGATCGGAAGAGC’) by default (suitable for both ends of paired-end libraries), but accepts other adapter sequence, too
  • For MspI-digested RRBS libraries, Trim Galore! performs quality and adapter trimming in two subsequent steps. This allows it to remove 2 additional bases that contain a cytosine which was artificially introduced in the end-repair step during the library preparation
  • For any kind of FastQ file other than MspI-digested RRBS, Trim Galore! can perform single-pass adapter- and quality trimming
  • The Phred quality of basecalls and the stringency for adapter removal can be specified individually
  • And more…
Trimgalore-figure-omictools
Example of a dataset downloaded from the SRA which was trimmed with a Phred score threshold of 20

Trim galore is built around Cutadapt and FastQC, and thus requires both tools to be installed to function properly.

The tool is downloadable here: Link and comes with a comprehensive and illustrated User Guide.

References:

Your top 3 gene clustering software tools

Banner-clustering-OMICtools

Clustering is a fundamental step in the analysis of biological and omics data. It is used to construct groups of objects (genes, proteins) with related function, expression patterns, or known to interact together. In microarrays or RNA-seq experiments, gene clustering is often associated with heatmap representation for data visualization.

Choosing the right clustering tool for your analysis

Many clustering methods and algorithms have been developed and are classified into partitioning (k-means), hierarchical (connectivity-based), density-based, model-based and graph-based approaches.

To help you choose between all the existing clustering tools, we asked OMICtools members to vote for their favorite software. Here are the top 3 tools, chosen by 23 voters.

First place for ClustEval

ClustEval is a web-based clustering analysis platform developed at the Max Planck Institute for Informatics and the University of Southern Denmark. It is designed to objectively compare the performance of various clustering methods from different datasets.

More precisely, ClustEval has compared the performances of 18 different clustering methods among the most used, using 24 different datasets. These datasets include gene expression data, protein sequence similarity, protein structure similarity, social network, word sense disambiguation, etc. The performance of a clustering method is then evaluated by a F1-score (harmonic mean of precision and recall).

Finally, ClustEval can be downloaded and installed by users to perform their own clustering analysis comparison, using VirtualBox image, Docker & Docker Compose or as a R package.

Performance-clustering-ClustEval-OMICtools
Performance of all clustering tools on all nonartificial data sets on the basis of F1 scores.

Second position for Babelomics

Babelomics is a web application developed by the Computational Genomics Department of the Principe Felipe Research Center in Valencia. It performs a wide range of functional analysis of gene expression and genomic data, from processing to expression analysis and gene set enrichment.

In its current version, Babelomics 5, the web-site displays a user-friendly and intuitive interface for the clustering of microarray or RNA-seq data using one of three different methods: UPGMA, SOTA, and k-means. The subsequent result can be visualized as a heatmap. Examples of data set and analysis are provided for every functionality of the application, and tutorials available here.

Babelomics-OMICtools
Babelomics clustering tool.

Third place for AltAnalyze

AltAnalyze is a comprehensive application for the analysis of single-cell and bulk RNA-seq data that can automatize every step of gene expression and splicing analysis, including clustering and heatmap representation. It was developed in the Nathan Salomonis laboratory at Cincinnati Children’s Hosptial Medical Center and the University of Cincinnati.

AltAnalyze proposes many options for clustering algorithms and normalization, as well as unique features such as finding optimized clusters for single-cell analysis.

AltAnalyse can be downloaded and run on all operating systems, and comes with useful documentation (tutorials, blog, FAQ).

Cluster-heatmap-AltAnalyze-OMICtools
Heatmap and clustering generated with AltAnalyze

References

(Wiwie et al., 2015) Comparing the performance of biomedical clustering methods. Nature Methods.

(Alonso et al., 2015) Babelomics 5.0: functional interpretation for new generations of genomic data. Nucleic Acids Research.

(Emig et al., 2010) AltAnalyze and DomainGraph: analyzing and visualizing exon expression data. Nucleic Acids Research.

(Olson et al., 2016) Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature.

 

Single-cell RNA sequencing in immunology

Banner-scRNAseq-Omictools

Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of the immune system and now has a wide range of applications in immunology. This technology spans the whole genome and provides an unbiased gene expression profile of individual cells.

Bulk vs single-cell RNA-seq

Traditional bulk RNA-seq is often performed on well-identified groups of cells thought to be homogeneous. However, quantification of molecular changes is made by estimating the mean value from millions of cells and averaging the signal of individual cells, thus ignoring cell-to-cell heterogeneity, which is a hallmark of adaptive immune cell subsets such as B and T lymphocytes.

The need to identify new and discrete immune cell populations and to understand molecular changes that occur at the single cell level has favored the development of low‐input RNA‐seq protocols, that now have a multitude of different applications and come with a bunch of new analysis tools.

ScRNA-seq-analysis-outline-omictools
Single-Cell RNA-Sequencing Analysis Outline. From Neu et al.

Main applications of scRNA-seq in immunology

Identification of new cell types and functions

By spanning the whole genome in search of unknown molecular markers, scRNA-seq can be used to identify new cell types and functions. While traditional qPCR approaches are sensitive and easy to perform, they require prior knowledge and are based on the measurement of a preselected pool of genes, which introduces bias. Using scRNA-seq technology in the context of immune response to a stimulus (infection, vaccination, autoimmunity) can lead to the identification of new activities and functions. Gene expression and quantification tools originally designed for bulk RNA-seq have now been successfully adopted for scRNA-seq data, including STAR, RSEM and Kallisto.

Characterization of heterogeneous populations

Adaptive immune cells such as B and T lymphocytes use V(D)J recombination to generate a highly diverse repertoire of receptors to recognize antigens. By combining single-cell identification of clonotypes with cell phenotype (eg responsive/autoreactive/anergic), researchers can find strategies to augment or lower specific immune responses. Several tools have now been developed to help you reconstruct full-length T and B cell receptors from scRNA-seq data, such as TraCeR, BASIC, and ImReP.

TCR-scRNA-seq-omictools
TCR sequences assembled from scRNA-seq reads during Salmonella infection in mice. From Stubbington et al.

Mapping transition states and cell fate decisions

Immune cell populations arise from precursor cells and go through a succession of checkpoints and states before becoming fully mature and functional. Mapping transition states and cell lineages with scRNA-seq can provide insights into developmental aspects of the immune system in health and disease. Specific tools let you organize individual cells in pseudotime and bifurcating developmental trajectories, such as Monocle and TSCAN.

Pseudotime-trajectory-scrna-seq-omictools
Bifurcating pseudotime trajectory. From Stubbington et al.

Personalized medicine

In the near future, scRNA-seq could revolutionize the field of personalized medicine in cancer by enabling researchers to identify individual clones and biomarkers in a tumor, and select precision drugs for each of them. Because one particular tumor cell can drive drug resistance or metastasis, scRNA-seq can provide critical information for rapid and personalized treatment. Of particular interest, the ESTIMATE algorithm can be applied to scRNA-seq data to identify the tumor phenotype and the proportion of tumor, immune, or stromal cells.

Personalized-medecine-scRNA-seq-omictools
scRNA-seq applications in cancer medicine. From Shalek and Benson.

Future directions

From flow cytometry to microscopy, the study of the immune system has often relied on technologies that operate at a single-cell resolution. With next-generation sequencing (NGS) technologies becoming cheaper, scRNA-seq will probably be routinely used by researchers in the near future.

Upcoming challenges will include data management and development of integrated multiplex tools to combine transcriptomics with other genomic data.

Based on recent papers:

(Neu et al., 2016) Single-Cell Genomics: Approaches and Utility in Immunology. Trends in Immunology

(Papalexi and Satija, 2017) Single-cell RNA sequencing to explore immune cell heterogeneity. Nature Reviews Immunology

(Shalek and Benson, 2017) Single-cell analyses to tailor treatments. Science Translational Medecine.

(Stubbington et al., 2017) Single-cell transcriptomics
 to explore the immune system in health and disease. Science.