RNA-sequencing (RNA-seq) is currently the leading technology for transcriptome analysis. RNA-seq has a wide range of applications, from the study of alternative gene splicing, post-transcriptional modifications, to comparison of relative gene expression between different biological samples.
To help you prepare and analyse your RNA-seq experiments in the best conditions, we have launched a new series of surveys focused on the best tools for each fundamental step of an RNA-seq experiment.
Starting your analysis with quality control
The first step in the analysis of an RNA-seq experiment is quality control. This crucial step will ensure that your data have the best quality to perform the subsequent steps of your analysis. Quality control usually include sequence quality, sequencing depth, reads duplication rates (clonal reads), alignment quality, nucleotide composition bias, etc.
We therefore start this series by presenting you the best QC tools, chosen by the OMICtools community!
Your number 1 tool: NGS QC Toolkit
NGS QC Toolkit was the favorite tool for 79% of OMICtools members.
This standalone and open source application proposes several QC tools to quality check and filter your NGS data. The toolbox is divided in 4 major groups of tool:
- Quality control tools for Illumina or Roche 454 data
- Trimming tools
- Format conversion tools
- Statistics tools
All QC tools can generate graphs as outputs, as well as diverse statistics, such as average quality scores at each base position, GC content distribution, etc.
Your second favorite tool: RseqFlow
RseqFlow is a RNA-seq analysis pipeline that covers pre- and post-mapping quality control, as well as other analysis steps. The pipeline is divided in 4 branches, that can be run individually or in a workflow mode.
- Branch 1: Quality Control and SNP calling based on the merging of alignments to the transcriptome and genome.
- Branch 2: Expression level quantification for Gene/Exon/Splice Junctions based on alignment to the transcriptome.
- Branch 3: Some file format conversions for easy storage, backup and visualization.
- Branch 4: Differentially expressed gene identification based on the output of the expression level quantification from Branch 2.
RseqFlow provides a downloadable Virtual Machine (VM) image managed with Pegasus, that allows users to run the pipeline easily using different computational resources, available here: Link
RseqFlow can also be run with a unix shell mode that allows users to execute each branch of analysis with a unix command (The following software must be pre-installed: Python 2.7 or higher, R 2.11 or higher, and GCC).
Third place for Trim Galore!
Trim Galore! is a wrapper script to automate quality and adapter trimming as well as quality control, with some added functionality to remove biased methylation positions for RRBS sequence files (for directional, non-directional (or paired-end) sequencing).
It’s main features include:
- For adapter trimming, Trim Galore! uses the first 13 bp of Illumina standard adapters (‘AGATCGGAAGAGC’) by default (suitable for both ends of paired-end libraries), but accepts other adapter sequence, too
- For MspI-digested RRBS libraries, Trim Galore! performs quality and adapter trimming in two subsequent steps. This allows it to remove 2 additional bases that contain a cytosine which was artificially introduced in the end-repair step during the library preparation
- For any kind of FastQ file other than MspI-digested RRBS, Trim Galore! can perform single-pass adapter- and quality trimming
- The Phred quality of basecalls and the stringency for adapter removal can be specified individually
- And more…
(Patel RK et al., 2012) NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data. PLoS ONE.
(Wang et al., 2011) RseqFlow: workflows for RNA-Seq data analysis. Bioinformatics.
(Wu et al., 2011) Using non-uniform read distribution models to improve isoform expression inference in RNA-Seq. Bioinformatics.