Single-cell RNA sequencing in immunology


Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of the immune system and now has a wide range of applications in immunology. This technology spans the whole genome and provides an unbiased gene expression profile of individual cells.

Bulk vs single-cell RNA-seq

Traditional bulk RNA-seq is often performed on well-identified groups of cells thought to be homogeneous. However, quantification of molecular changes is made by estimating the mean value from millions of cells and averaging the signal of individual cells, thus ignoring cell-to-cell heterogeneity, which is a hallmark of adaptive immune cell subsets such as B and T lymphocytes.

The need to identify new and discrete immune cell populations and to understand molecular changes that occur at the single cell level has favored the development of low‐input RNA‐seq protocols, that now have a multitude of different applications and come with a bunch of new analysis tools.

Single-Cell RNA-Sequencing Analysis Outline. From Neu et al.

Main applications of scRNA-seq in immunology

Identification of new cell types and functions

By spanning the whole genome in search of unknown molecular markers, scRNA-seq can be used to identify new cell types and functions. While traditional qPCR approaches are sensitive and easy to perform, they require prior knowledge and are based on the measurement of a preselected pool of genes, which introduces bias. Using scRNA-seq technology in the context of immune response to a stimulus (infection, vaccination, autoimmunity) can lead to the identification of new activities and functions. Gene expression and quantification tools originally designed for bulk RNA-seq have now been successfully adopted for scRNA-seq data, including STAR, RSEM and Kallisto.

Characterization of heterogeneous populations

Adaptive immune cells such as B and T lymphocytes use V(D)J recombination to generate a highly diverse repertoire of receptors to recognize antigens. By combining single-cell identification of clonotypes with cell phenotype (eg responsive/autoreactive/anergic), researchers can find strategies to augment or lower specific immune responses. Several tools have now been developed to help you reconstruct full-length T and B cell receptors from scRNA-seq data, such as TraCeR, BASIC, and ImReP.

TCR sequences assembled from scRNA-seq reads during Salmonella infection in mice. From Stubbington et al.

Mapping transition states and cell fate decisions

Immune cell populations arise from precursor cells and go through a succession of checkpoints and states before becoming fully mature and functional. Mapping transition states and cell lineages with scRNA-seq can provide insights into developmental aspects of the immune system in health and disease. Specific tools let you organize individual cells in pseudotime and bifurcating developmental trajectories, such as Monocle and TSCAN.

Bifurcating pseudotime trajectory. From Stubbington et al.

Personalized medicine

In the near future, scRNA-seq could revolutionize the field of personalized medicine in cancer by enabling researchers to identify individual clones and biomarkers in a tumor, and select precision drugs for each of them. Because one particular tumor cell can drive drug resistance or metastasis, scRNA-seq can provide critical information for rapid and personalized treatment. Of particular interest, the ESTIMATE algorithm can be applied to scRNA-seq data to identify the tumor phenotype and the proportion of tumor, immune, or stromal cells.

scRNA-seq applications in cancer medicine. From Shalek and Benson.

Future directions

From flow cytometry to microscopy, the study of the immune system has often relied on technologies that operate at a single-cell resolution. With next-generation sequencing (NGS) technologies becoming cheaper, scRNA-seq will probably be routinely used by researchers in the near future.

Upcoming challenges will include data management and development of integrated multiplex tools to combine transcriptomics with other genomic data.

Based on recent papers:

(Neu et al., 2016) Single-Cell Genomics: Approaches and Utility in Immunology. Trends in Immunology

(Papalexi and Satija, 2017) Single-cell RNA sequencing to explore immune cell heterogeneity. Nature Reviews Immunology

(Shalek and Benson, 2017) Single-cell analyses to tailor treatments. Science Translational Medecine.

(Stubbington et al., 2017) Single-cell transcriptomics
 to explore the immune system in health and disease. Science.

Evaluating biomedical data production with text mining


Estimating biomedical data

Evaluating the impact of a scientific study is a difficult and controversial task. Recognition of the value of a biomedical study is widely measured by traditional bibliographic metrics such as the number of citations of the paper and the impact factor of the journal.

However a more relevant critical success criteria for a research study likely lies in the production itself of biological data, both in terms of quality and also how these datasets can be reused to validate (or reject!) hypotheses and support new research projects. Although biological data can be deposited in specific repositories such as the GEO database, ImmPort, ENA, etc., most data are primarily disseminated in articles within the text, figures and tables. This raises the question – how can we find and measure the production of biomedical data diffused in scientific publications?

To address this issue, Gabriel Rosenfeld and Dawei Lin developed a novel text-mining strategy that identifies articles producing biological data. They published their method “Estimating the scale of biomedical data generation using text mining” this month on BioRxiv.

Text mining analysis of biomedical research articles

Using the Global Vector for Word Representation (GloVe) algorithm, the authors identified term usage signatures for 5 types of biomedical data: flow cytometry, immunoassays, genomic microarray, microscopy, and high-throughput sequencing.

They then analyzed the free text of 129,918 PLOS articles published between 2013 and 2016. What they found was that nearly half of them (59,543) generated 1 or more of the 5 data types tested, producing 81,407 data sets.


Estimating PLOS articles generating each biomedical data type over time (from “Estimating the scale of biomedical data generation using text mining“, BioRxiv).

This text-mining method was tested on manually annotated articles, and provided a valuable balance of precision and recall. The obvious next  – and exciting – step is to apply this approach to evaluate the amount and types of data generated within the entire PubMed repository of articles.


Estimating PLOS articles generating each biomedical data type over time (from “Estimating the scale of biomedical data generation using text mining“, BioRxiv).

A step beyond data dissemination

Evaluating the exponentially growing amount and diversity of datasets is currently a key aspect of determining the quality of a biomedical study. However in today’s era of bioinformatics, in order to fully exploit the data we need to take this a step beyond the publication and dissemination of datasets and tools, towards the critical parameter of improving data reproducibility and transparency (data provenance, collection, transformation, computational analysis methods, etc.).

Open-access and community-driven projects such as the online bioinformatics tools platform OMICtools, provide access not only to a large number of repositories to locate valuable datasets, but also to the best software tools for re-analyzing and exploiting the full potential of these datasets.

In a virtual circle of discovery, previously generated datasets could be repurposed for new data production, interactive visualization, machine learning and artificial intelligence enhancement, allowing us to answer new biomedical questions.

Improving DNA amplification for single-cell genomics


The single-cell DNA sequencing challenge

Deep sequencing of genomes (Whole Genome Sequencing, WGS) is important not only to improve our knowledge in life sciences and evolutionary biology but also to make clinical progresses. The analysis of the genome and its variations at the cell level have major applications: analysis of mutation rates in somatic cells, including copy-number variations (CNVs)  and single-nucleotide variations (SNVs), evolution of cancer, recombination in germ cells, preimplantation genetic analysis for embryos or analysis of microbial populations (mini-metagenomics).

Because of the low amount of DNA in a cell, single-cell whole genome sequencing requires whole genome amplification.  The 3 methods currently used are degenerate oligonucleotide-primed polymerase chain reaction (DOP-PCR), multiple displacement amplification (MDA), and multiple annealing and looping-based amplification cycles (MALBAC). However, these methods have limited capability to detect genomic variants and create amplification bias, artefacts and errors (see the overview by Gawad C. et al.).

New methodology for single-cell whole genome amplification

To overcome the limitations of exponential amplification, Xie group has recently developed the Linear Amplification via Transposon Insertion (LIANTI) method.

LIANTI takes advantage of Tn5 transposition and T7 in vitro transcription to linearly amplify genomic DNA fragments from a single human cell.


Fig 1. LIANTI scheme. Genomic DNA from a single cell is randomly fragmented and tagged by LIANTI transposon, followed by DNA polymerase gap extension to convert single-stranded T7 promoter loops into double-stranded T7 promoters on both ends of each fragment. In vitro transcription overnight is performed to linearly amplify the genomic DNA fragments into genomic RNAs, which are capable of self-priming on the 3′ end. After reverse transcription, RNase digestion, and second-strand synthesis, double-stranded LIANTI amplicons tagged with unique molecular barcodes are formed, representing the amplified product of the original genomic DNA from a single cell, and ready for DNA library preparation and next-generation sequencing. From Chongyi C., et al. Science. 356:189-194. 

LIANTI exhibits the highest amplification uniformity compared to the other current WGA methods. It allows accurate detection of single-cell micro-CNVs with kilobase resolution). LIANTI method also achieves the highest amplification fidelity for accurate single-cell SNV detection.


Fig 2. LIANTI amplification uniformity and fidelity. (A) coefficient of variation for read depths along the genome as a function of bin sizes from 1 b to 100 Mb, showing amplification noise on all scales for single-cell WGA methods, including DOP-PCR, MALBAC, MDA, and LIANTI. The normalized MALBAC data (dashed line) are shown together with the unnormalized MALBAC data. Only the unnormalized data of the other methods are shown as no substantial improvement by normalization was observed. Poisson curve is the expected coefficient of variation for read depth assuming only Poisson noise. LIANTI exhibits a much improved amplification uniformity over the previous methods on all scales. (B) False-positive rates of SNV detection in a single BJ cell. The error bars were calculated from three different BJ cells. From Chongyi C., et al. Science. 356:189-194.

The high precision of genomic variants detection by LIANTI method would help improved analysis of single-cell DNA sequences, better diagnosis and understanding the evolution of cancer and other diseases.

Based on the recent papers:

Chen C. et al. (2017) Single-cell whole-genome analyses by Linear Amplification via Transposon Insertion (LIANTI)Science. 356:189-194.

Gawad C. et al. (2016) Single-cell genome sequencing: current state of the science. Nat. Rev. Genet. 17(3):175-88.