Improving DNA amplification for single-cell genomics

single-cell-dna-analysis-method

The single-cell DNA sequencing challenge

Deep sequencing of genomes (Whole Genome Sequencing, WGS) is important not only to improve our knowledge in life sciences and evolutionary biology but also to make clinical progresses. The analysis of the genome and its variations at the cell level have major applications: analysis of mutation rates in somatic cells, including copy-number variations (CNVs)  and single-nucleotide variations (SNVs), evolution of cancer, recombination in germ cells, preimplantation genetic analysis for embryos or analysis of microbial populations (mini-metagenomics).

Because of the low amount of DNA in a cell, single-cell whole genome sequencing requires whole genome amplification.  The 3 methods currently used are degenerate oligonucleotide-primed polymerase chain reaction (DOP-PCR), multiple displacement amplification (MDA), and multiple annealing and looping-based amplification cycles (MALBAC). However, these methods have limited capability to detect genomic variants and create amplification bias, artefacts and errors (see the overview by Gawad C. et al.).

New methodology for single-cell whole genome amplification

To overcome the limitations of exponential amplification, Xie group has recently developed the Linear Amplification via Transposon Insertion (LIANTI) method.

LIANTI takes advantage of Tn5 transposition and T7 in vitro transcription to linearly amplify genomic DNA fragments from a single human cell.


lianti-scheme

Fig 1. LIANTI scheme. Genomic DNA from a single cell is randomly fragmented and tagged by LIANTI transposon, followed by DNA polymerase gap extension to convert single-stranded T7 promoter loops into double-stranded T7 promoters on both ends of each fragment. In vitro transcription overnight is performed to linearly amplify the genomic DNA fragments into genomic RNAs, which are capable of self-priming on the 3′ end. After reverse transcription, RNase digestion, and second-strand synthesis, double-stranded LIANTI amplicons tagged with unique molecular barcodes are formed, representing the amplified product of the original genomic DNA from a single cell, and ready for DNA library preparation and next-generation sequencing. From Chongyi C., et al. Science. 356:189-194. 

LIANTI exhibits the highest amplification uniformity compared to the other current WGA methods. It allows accurate detection of single-cell micro-CNVs with kilobase resolution). LIANTI method also achieves the highest amplification fidelity for accurate single-cell SNV detection.

lianti-amplification-quality

Fig 2. LIANTI amplification uniformity and fidelity. (A) coefficient of variation for read depths along the genome as a function of bin sizes from 1 b to 100 Mb, showing amplification noise on all scales for single-cell WGA methods, including DOP-PCR, MALBAC, MDA, and LIANTI. The normalized MALBAC data (dashed line) are shown together with the unnormalized MALBAC data. Only the unnormalized data of the other methods are shown as no substantial improvement by normalization was observed. Poisson curve is the expected coefficient of variation for read depth assuming only Poisson noise. LIANTI exhibits a much improved amplification uniformity over the previous methods on all scales. (B) False-positive rates of SNV detection in a single BJ cell. The error bars were calculated from three different BJ cells. From Chongyi C., et al. Science. 356:189-194.

The high precision of genomic variants detection by LIANTI method would help improved analysis of single-cell DNA sequences, better diagnosis and understanding the evolution of cancer and other diseases.

Based on the recent papers:

Chen C. et al. (2017) Single-cell whole-genome analyses by Linear Amplification via Transposon Insertion (LIANTI)Science. 356:189-194.

Gawad C. et al. (2016) Single-cell genome sequencing: current state of the science. Nat. Rev. Genet. 17(3):175-88.

 

Collaborations for development of bioinformatics tools

In the era of big data, international collaborations are crucial for data sharing and bioinformatics tools engineering. Common projects between people working at different locations worldwide favors the exchange of expertise and development of new ideas. It allows investment of the time and effort required to develop good quality programs as well as benefits from shared funding resources.

This interactive chord diagram shows the collaborations between the countries that have led to the creation of a new bioinformatics tool among the thousands within the OMICtools repository. Mouseover to focus on collaboration from one of the top countries. The thickness of the curve visualizes the numbers of publications (built with D3 : https://d3js.org/).

How to make software more robust?

bioinformatics-software-design

Scientific quality and reproducibility rely on the traceability of the experimental data, statistical methods and bioinformatics tools used to generate results. Being unable to replicate and validate scientific results is unfortunately very common. This reproducibility crisis as named by Monya Baker considerably slows down the research progress and affects all of the fields including chemistry, biology and medicine.

Best practices are crucially needed today to improve reproducibility of data analysis and hence to make software robust enough to be run by any user.

Indeed, most of the software tools used to produce scientific results and publications are prototypes and lack robustness. Usually designed and run by a single person in a specific computing environment, codes may be very difficult to be used by other persons to analyze their data and are too often abandoned after publication. Last month, Morgan Taschuk and Greg Wilson published Ten simples rules for making research software more robust providing a quick guide for mastering the key challenge of robustness in software engineering.

What is a “robust” software?

The authors define robust software as a “software that works for people other than the original author and on machines other than its creator’s.” And this mean that “it can be installed on more than one computer with relative ease, it works consistently as advertised, and it can be integrated with other tools.”

Increasing software robustness is a key question for software developers and all users who want to produce replicable and reproducible results and publish their work.Improving software robustness would only take the effort to follow these ten simple rules summarized in the list below:

1. Use version control

2. Document your code and usage

3. Make common operations easy to control

4. Version your releases

5. Reuse software (within reason)

6. Rely on build tools and package managers for installation

7. Do not require root or other special privileges to install or run

8. Eliminate hard-coded paths

9. Include a small test set that can be run to ensure the software is actually working

10. Produce identical results when given identical inputs

How OMICtools promotes software quality and traceability

OMICtools has developed several strategies to promote better quality of bioinformatics resources and reproducibility of computational analysis.

First, OMICtools promotes the citation of bioinformatics resources and exact code version identification for reproducibility and traceability of biological data analysis.

OMICtools brings together thousands of software in a single place where any user can find all the relevant information to choose and use the program he needs. Our search engine offer an easy way to get the list of tools dedicated to a specific question and analysis function. Moreover, citations and references are specified for each tool as well as the successive program versions and obsolete links to facilitate the survey of bioinformatics tools

Secondly, OMICtools is a collaborative repository platform that facilitates the development, maintenance and follow-up of bioinformatic tools by programmers themselves.

Software developers can directly upload their source codes into the OMICtools server so the community can easily locate them. In addition to the research resource identifier (RRID) which is attributed for each of OMICtools resource, each published source code version get a unique digital object identifier (DOI). Attributing DOI provides an interoperable exchange with other digital resources and a persistent identification, even if material is moved or rearranged. Software developers indicate the version of the source code, the operating system and architecture, as well as the publication, to link the code and program access to DataCite’s API which automatically generates the corresponding DOI.  They can modify and update their own project by providing their new code versions. Moreover, OMICtools is implementing a dedicated GitLab service. On their GitLab page, programmers will be able to modify and update their own projects and work together to test, build, consolidate and deploy their codes.

App developers, here’re three good reasons to upload your code versions on OMICtools repository platform:

omictools-for-software-developers

Based on the recent papers:

(Taschuk and Wilson, 2017) Ten simple rules for making research software more robust. PLoS Computational Biology.

(Baker, 2016) 1,500 scientists lift the lid on reproducibility. Nature.