Process 16S rRNA sequences with the sl1p tool


Advancing DNA sequencing technologies have encouraged a surge of microbiome studies. The microbiome, the set of microbes (bacteria, viruses, archaea) who live in a particular environmental niche, has been extensively studied, including in the context of human disease, changes in ecological environments, and progressive oxygen gradients in the deep sea. One of the most popular methods for these types of studies is the sequencing of segments of the 16S rRNA gene– a highly conserved gene among bacterial populations which allows researchers to identify the taxonomic diversity within a given bacterial niche.

Drs. Whelan and Surette have recently come up with a new tool, sl1p, that helps automate the processing of 16S rRNA gene sequencing data and provides analyses which allow the user to jump right into answering their own microbiome-related research questions without extensive bioinformatics training. Here, they describe the main features and benefits of their tool.

The need for a better tool

Many tools and pipelines exist for the processing of microbial marker gene data. Many of these, such as the popular QIIME and mothur, process data using different approaches and algorithms, or provide the user with a choice of approaches for these various steps. Further, these tools often consist of a set of command line steps which are both time consuming and prone to irreproducibility. To address these issues, we developed the short-read library 16S rRNA gene sequencing pipeline (sl1p; pronounced “slip”), a stand-alone pipeline which automates these steps into an easy-to-use, reproducible approach.

sl1p processes 16S rRNA gene sequencing data with the most biologically accurate tools

In order to process 16S rRNA gene sequencing data, a variety of processing steps must be implemented. These include but are not limited to quality filtering, checking for chimeras, picking operational taxonomic units (OTUs), and assigning taxonomy to OTUs (Fig.1). sl1p implements a wide variety of algorithms and options for each of these processing steps. Importantly, the defaults of sl1p were carefully chosen to represent the tools and approaches which worked best in a comprehensive comparison using mock human microbiome sequencing datasets and cultured isolates. Detailed information about these comparions can be found in Whelan FJ & Surette MG (2017) Microbiome.

Figure 1. Processing steps implemented in Sl1p

sl1p conducts preliminary analyses of microbial community data

Included in sl1p’s output are preliminary analyses that the user can use to quickly obtain a broad understanding of their data immediately after sl1p has been run. The preliminary analyses produced by sl1p include a summary of the amount of non-bacterial reads in each sample, taxonomic summaries of each sample at various taxonomic levels (phyla, class, order, family, and genus), as well as alpha- and beta-diversity outputs using 3 different distance metrics (Fig.2). Importantly, these outputs are produced using both QIIME and R and the raw commands for both are included for the user to use as they further interrogate their data to answer questions specific to their research, making these analyses more approachable to the non-bioinformatician.

Figure 2. Preliminary analyses provided in Sl1p

sl1p promotes reproducibility

The main goal of sl1p was to make reproducible and accurate microbiome research more accessible. sl1p produces a comprehensive logfile (Fig.3) which outlines exactly how sl1p was called, important version information of each of the software dependencies, and how each processing step was conducted. This logfile is a valuable tool in order to be able to reproduce a given sl1p run or to understand how small changes in the processing workflow can alter the resulting data output. Further, sl1p provides an R markdown file detailing each step taken in sl1p’s preliminary analyses of the data. Not only is this file an appropriate place for the user to start their own analyses, but it provides transparency in how the sl1p outputs are generated.

Figure 3. Sl1p logfile produced after analysis


Whelan FJ & Surette MG. (2017). A comprehensive evaluation of the sl1p pipeline for 16S rRNA gene sequencing analysis. Microbiome.

Your top 3 RNA-seq quality control tools


RNA-sequencing (RNA-seq) is currently the leading technology for transcriptome analysis. RNA-seq has a wide range of applications, from the study of alternative gene splicing, post-transcriptional modifications, to comparison of relative gene expression between different biological samples.

To help you prepare and analyse your RNA-seq experiments in the best conditions, we have launched a new series of surveys focused on the best tools for each fundamental step of an RNA-seq experiment.

Starting your analysis with quality control

The first step in the analysis of an RNA-seq experiment is quality control. This crucial step will ensure that your data have the best quality to perform the subsequent steps of your analysis. Quality control usually include sequence quality, sequencing depth, reads duplication rates (clonal reads), alignment quality, nucleotide composition bias, etc.

We therefore start this series by presenting you the best QC tools, chosen by the OMICtools community!

Your number 1 tool: NGS QC Toolkit

NGS QC Toolkit was the favorite tool for 79% of OMICtools members.

This standalone and open source application proposes several QC tools to quality check and filter your NGS data. The toolbox is divided in 4 major groups of tool:

  • Quality control tools for Illumina or Roche 454 data
  • Trimming tools
  • Format conversion tools
  • Statistics tools

All QC tools can generate graphs as outputs, as well as diverse statistics, such as average quality scores at each base position, GC content distribution, etc.

NGS QC Toolkit toolbox

The application can be downloaded here:  Link and can be run on Windows and Linus operating system, provided Activeperl is installed.

Your second favorite tool: RseqFlow

RseqFlow is a RNA-seq analysis pipeline that covers pre- and post-mapping quality control, as well as other analysis steps. The pipeline is divided in 4 branches, that can be run individually or in a workflow mode.

  • Branch 1: Quality Control and SNP calling based on the merging of alignments to the transcriptome and genome.
  • Branch 2: Expression level quantification for Gene/Exon/Splice Junctions based on alignment to the transcriptome.
  • Branch 3: Some file format conversions for easy storage, backup and visualization.
  • Branch 4: Differentially expressed gene identification based on the output of the expression level quantification from Branch 2.

RseqFlow provides a downloadable Virtual Machine (VM) image managed with Pegasus, that allows users to run the pipeline easily using different computational resources, available here: Link

RseqFlow can also be run with a unix shell mode that allows users to execute each branch of analysis with a unix command (The following software must be pre-installed: Python 2.7 or higher, R 2.11 or higher, and GCC).

Third place for Trim Galore!

Trim Galore! is a wrapper script to automate quality and adapter trimming as well as quality control, with some added functionality to remove biased methylation positions for RRBS sequence files (for directional, non-directional (or paired-end) sequencing).

It’s main features include:

  • For adapter trimming, Trim Galore! uses the first 13 bp of Illumina standard adapters (‘AGATCGGAAGAGC’) by default (suitable for both ends of paired-end libraries), but accepts other adapter sequence, too
  • For MspI-digested RRBS libraries, Trim Galore! performs quality and adapter trimming in two subsequent steps. This allows it to remove 2 additional bases that contain a cytosine which was artificially introduced in the end-repair step during the library preparation
  • For any kind of FastQ file other than MspI-digested RRBS, Trim Galore! can perform single-pass adapter- and quality trimming
  • The Phred quality of basecalls and the stringency for adapter removal can be specified individually
  • And more…
Example of a dataset downloaded from the SRA which was trimmed with a Phred score threshold of 20

Trim galore is built around Cutadapt and FastQC, and thus requires both tools to be installed to function properly.

The tool is downloadable here: Link and comes with a comprehensive and illustrated User Guide.


Analyse co-expression gene modules with CEMItool


Identifying single changes in gene expression levels is a common analysis step after a microarray or RNA-Seq experiment. The expression levels of co-expressed genes can also be analyzed and visualized by gene co-expression networks (GCNs), which are undirected graphs used to represent co-expression relationships between pairs of genes across samples.

Dr. Helder Nakaya from Sao Paolo University has recently developed CEMItool, an easy-to-use method to automatically run gene co-expression analyses in R. Here, he describes the features provided by CEMItools.

Analyse your transcriptomic data for co-expression modules

The analysis of co-expression gene modules can help uncover the mechanisms underlying diseases and infection. CEMItool is a fast and easy-to-use Bioconductor package that unifies the discovery and the analysis of co-expression modules.

Among its features, CEMItool evaluates whether modules contain genes that are over-represented by specific pathways or that are altered in a specific sample group, as well as it integrates transcriptomic data with interactome information, identifying the potential hubs on each network.

In addition, CEMiTool provides users with a novel unsupervised gene filtering method, and automated parameter selection for identifying modules. The tool then reports everything in HTML web pages with high-quality plots and interactive tables.

CEMItool features

Several functions can be run independently, or all at once using the cemitool function.

Using a simple command line, CEMItool can generate a plot that displays the expression of each gene within a module:

Expression of each genes within a module

CEMItool can also determine which biological functions are associated with the module by performing an over representation analysis (ORA). For this command, a pathway list must be provided in the form of GMT file:


Biological functions associated with the module.

Finally, interaction data, such as protein-protein interactions can be visualized in annotated module graphs:

Annotated graph showing interactions within a module.

Overall, the CEMItool provides the following benefits:

  • Easy-to-use package, automating within a single R function (cemitool) the entire module discovery process – including gene filtering and functional analyses
  • Perform comprehensive modular analysis
  • Fully automated process

A comprehensive instruction guide for CEMItool is provided on Bioconductor : Link


Russo P, Ferreira G, Bürger M, Cardozo L and Nakaya H (2017). CEMiTool: Co-expression Modules identification Tool. R package version 1.1.1.