CircRNAs in human and mouse tissues

Circle-rna-omictools

Leng Han and Chunjiang He are the creators of the tissue-specific circular RNA (circRNA) database TSCD. They performed the first global analysis of tissue-specific circRNAs and collected these data in a comprehensive database. Here, they talk about their work and how the TSCD database can help researchers explore their RNA sequencing data.

A repository of more than 300,000 tissue-specific RNAs

With circRNAs attracting more attention in transcriptome research, we explored the global features of tissue-specific circRNAs in embryo development and organ differentiation. To identify tissue specific circRNAs, 3 algorithms, CIRI, circRNA_finder and find_circ, were applied to RNA-seq data collected from the ENCODE project and the NCBI GEO database.

Based on the major types of circRNA, we identified more than 300,000 tissue-specific circRNAs in different tissues. Our analyses indicated that tissue-specific circRNAs were mainly derived from exons, although they can also be derived from introns or intergenic regions. The majority are generated from protein-coding genes, which suggests that these circRNAs are associated with mRNA translation or are an mRNA backup.

Among all circRNAs, 10.4% of human circRNAs and 34.3% of mouse circRNAs are tissue-specific, which suggests a link with tissue development. We also observed uneven distribution of tissue-specific circRNAs across different tissues, and found that there are more tissue-specific circRNAs expressed in the brain (89,137 were identified in fetal brain), which may be due to the complexity of neuronal activity in the brain.

tissue-circrna-omictools

Abundance of TS circRNAs across different tissues: (A) 16 adult human tissue types; (B) 15 fetal human tissues;  (C) and 9 mouse tissues (in log2 of SRPTM: number of circular reads/number of mapped reads (units in trillion)/read length).

Functional enrichment analysis revealed that tissue-specific circRNAs are largely associated with tissue development and differentiation. To understand the potential functions of tissue-specific circRNAs, we identified a significant number of miRNA binding elements (MRE) and RBP (RNA binding protein) binding sites.

Finding a tissue-specific circRNA in TSCD

Users can easily browse TSCD content via a browser page and can view tissue-specific circRNAs by selecting:

  • Human adult tissue, human fetal tissue, or mouse tissue
  • And one of the 26 individual tissue types including adipose, adrenal, blood vessel, brain, esophagogastric, esophagus, eye, female gonad, heart, intestine, kidney, liver, lung, mammary gland, pancreas, skeletal muscle, skin, spleen, stomach, testis, thymus, thyroid gland, tibial nerve, tongue, umbilical cord, and uterus.

Data organization and visualization on  the TSCD web interface

All data have been organized into a set of relational MySQL tables. Customized Java and PHP scripts were used to construct the interface of database. The visualization page displays the coordinates of each circRNA.

The index page allows the user to easily query the information concerning TS circRNAs by chromosome, start and end site, junction read, conservation, genomic location, etc.

tscd-interface-omictoolsWeb interface of TSCD.

  1. Users can view the comprehensive information as tissue category, circRNA ID, coordinates of backsplice sites, genomic locations, junction reads, strand information, genomic spanning length, gene annotation and MRE/RBP sites.
  2. More importantly, users can visualize the details of tissue-specific circRNA through the gene symbol link. Backsplices of circRNA are represented by arcs: a black arc for non-specific circRNAs, a red arc for tissue-specific circRNAs.
  3. Annotated exons and introns of reference transcripts. If the reference genes have multiple transcripts, all transcripts are displayed. If the circRNA is generated from multiple genes, the exon structures of all related genes are displayed to better illustrate the biogenesis of circRNAs. TSCD provides the tables including all precise coordinates of each backsplice of circRNA across different tissues.

Exploring tissue-specific circRNAs with TSCD

TSCD offers several pages that are of benefit to the research community:

  • The Browser-hg38|mm10 page which displays coordinates for each circRNA based on the latest genome version, including GRCH38 and mm10.
  • The comparison page which allows users to compare circRNAs among different tissues.
  • The download page which allows users to batch download tissue-specific circRNAs from all tissues and the customized Perl script to identify the tissue-specific circRNAs from their own RNA-seq data.

References

(Xia et al., 2016) Comprehensive characterization of tissue-specific circular RNAs in the human and mouse genomes. Brief Bioinform.

Your Top 3 Circos plot generation tools

big-data-circos-plot-omictools

Making great images of your data

With the growing amount of biological data generated, innovative bioinformatics tools have been developed for modelling and synthetizing complex information in comprehensive figures. Several infographics types are now available for an informative and clear representation and analysis of your data, and which differ depending on the specific domain and question you are studying.

So how do you choose the best tools to efficiently explore your data and illustrate your scientific findings?

To help you answer this key question, we have initiated a series of surveys with users on the main categories of data visualization tools among those which are most used by the OMICtools community. The first of our survey series concerns the Circos plot generation tools.

Using Circos plots

Circos plots allow you to visualize data in a circular layout. This kind of representation is particularly useful to integrate and compare large amounts of data. Circos is one of the best infographics to show relationships between elements. The Circos plot has become a standard method for presenting genomics and epigenomics data, genome annotation and comparative genomics, offering fine visualization of sequence alignments, conservation, synteny, rearrangements, gene expression, methylation levels, and more. Circos plots can also be used to display any kind of data domains with multi-layer features and relationships.

Here are the top 3 best tools, selected by 65 of you, OMICtools members, for creating Circos plots.

The Gold medal goes to the popular Circos tool

Your #1 top tool is the well-known command-line based Circos software, with 66% of the votes.

Originally conceived for visualizing genomic data such as alignments and structural variations, Circos uses a circular ideogram layout that can display data as a scatter, line or histogram plots, heat maps, tiles, connectors, and text.

Circos-genomics-omictools

Circos has features that makes it ideal for drawing genomic information. Shown here are ChIP-Seq, chr 22 methylation, whole-genome methylation, multi-species comparison, human genome variation and self-similarity and MLL recombinome.

Circos is a free command-line application written in Perl. It can be deployed on any operating system for which Perl is available (e.g. Windows, Mac OS X, Linux and other UNIX). Circos produces bitmap (PNG) and vector (SVG) images using plain text configuration and input files. A very complete website with documentation is available with a series of 8 online tutorials presenting each specific feature of Circos, a quick guide, support through the Circos forum, as well as several examples of published images.

circos-examplefigure-omictools

For the last 10 years, this tool has helped thousands of scientists from various field to create beautiful representations of their data. Circos software has been used and referenced in more than 500 scientific publications and a larger variety of publications such as in the New York Times.

Silver medal for BioCircos.js and ggbio tools

The second place went to the BioCircos.js library and the R package ggbio, with 40% of the votes each.

Web visualization applications have the advantage of generating interactive graphs, in which all elements are interactive with mouse-over explanations and clickable buttons. This provides a more user-friendly Circos plot representation with easily accessible information.

BioCircos.js is an open source interactive JavaScript library, based on the D3 (Data-Driven Documents) and jQuery JavaScript libraries. It offers flexible plugins and powerful functionality for developers who need to build web-based applications for Circos plot generation. Biocircos.js supports multiple-platforms and works in all major internet browsers (Google Chrome is recommended). Biocircos.js version 1.1 is available (since September 2016), as well as updated documentation. Several modules are provided (SNP, CNV, HEATMAP, LINK, LINE, SCATTER, ARC, TEXT, and HISTGRAM) to display genome-wide genetic variations (SNPs, CNVs and chromosome rearrangement), gene expression and biomolecule interactions.

GGbio R package (version 1.24.1) offers the advantage of using the statistical functionality available in R as well as the grammar of graphics and the data handling capabilities of the Bioconductor project. A quick start guide and a manual were also released with Bioconductor. This tool has been mainly used to explore genome annotations and HTS data. The figures provide detailed views of genomic regions, sequence alignments and splicing patterns, and genome-wide overviews with karyogram, circular and grand linear layouts.

ggbio-figure-omictoolsGgbio application: Representation of copy number whole-genome profiles of five follicular lymphoma tumor samples generated from the Affymetrix Mapping 500K array. From Yin et al., 2012.  Genome Biology.

Bronze medal for the recent CircosVCF tool

The third place went to the web application CircosVCF with 34% of the votes.

CircosVCF is an interactive free web interface designed for vizualizing variants in genome-wide datasets. It was implemented in JavaScript and supports several browsers (Chrome, Firefox, Explorer 10+, Edge). CircosVCF provides circos visualization of input files in the standard Variant Call Format (large VCF files). It offers a very simplified user-friendly graphical interface to create Circos plots with an interactive design and the integration of additional information such as experimental data or annotations. The visualization capabilities of CircosVCF give a global overview of relationships between genomes and allow identification of SNPs regions.

Here is a demo for using CircosVCF:

Our next survey on data visualization will focus on heatmap generation tools. You are welcome to participate!

References

(Krzywinski et al., 2009) Circos: an information aesthetic for comparative genomics.  Genome Research.
(Cui et al., 2016) BioCircos.js: an interactive Circos JavaScript library for biological data visualization on web applications. Bioinformatics.
(Yin et al., 2012) ggbio: an R package for extending the grammar of graphics for genomic data.  Genome Biology.
(Drori et al., 2017) CircosVCF: circos visualization of whole-genome sequence variations stored in VCF files.  Bioinformatics.

6 ways biorepositories support clinical research

biorepository-and-medical-research
Guest blog post from Geneticist Inc.

Biorepositories help research institutions by providing tissue samples for clinical studies. Human tissue samples play a critical role in disease research by enabling assessments of molecular expression, prediction of toxicity, and identification of biomarkers. They help clarify and expand field-of-use claims, selection of appropriate species for preclinical studies and they assist in the clinical trial stages of drug development. Below is a list of key areas where availability of tissues (both from humans and preclinical species) can support pharmaceutical and other research.

Assessment of Molecular Expression

Biobanks contain vast libraries of human tissue samples, allowing for the assessment of expression levels of biological target molecules such as proteins and RNA. Methods include immunohistochemistry, in situ hybridization, western blotting, PCR and tissue microarrays, all of which can be applied to both normal and diseased tissues to assess expression levels of target molecules. Determining expression levels in a large volume of tissue samples provides critical information to drug developer, allowing an assessment of the appropriateness of a potential drug target. The exclusion of inappropriate drug targets saves millions in funding and years of wasted research.

FFPE DNA and RNA analysis leads the way as a source of comprehensive tissue information. It enables the stratification of tissues, thus advancing our understanding of heterogeneous diseases like cancer that were previously treated without an appreciation of their inherent molecular heterogeneity.

Toxicity Predictions

By illuminating altered levels of target molecule expression in organs and tissues outside of those targeted by a drug, data gathered from testing tissue samples can warn researchers of unanctipated toxicity in drugs under development.

Biomarker Studies

In addition to assessing expression levels, human tissue samples provide an excellent source for the identification and clarification of biomarkers. Well-annotated tissues offer an opportunity for disease stratification that can help identify appropriate personalized therapy for patients exhibiting similar biomarker profiles.

Field of Use Claims

By accurately identifying drug targets in tissue samples, targets can then be searched for in well-classified samples from patients with different diseases. The enormous quantities of well-annotated FFPE blocks could serve as a means to expand the use of existing drugs for diseases that exhibit similarities in biomarkers.

Preclinical Species Selection

The selection of appropriate species for preclinical evaluation of pipeline drugs can be aided by tissue procurement from biorepository collections, particularly procurement of FFPE tissue. This is done by analyzing differences and selecting species with the most similar target compound expression profiles, as determined by tissue arrays. Efficiently modeling human diseases assists drug developers to avoid investigating costly, dead-end avenues, testing compounds in preclinical stages that will prove ineffective or toxic in human trials.

Clinical Trials

Once drug development reaches the clinical stage, embedded tissue blocks can continue to play a critical role in furthering research. Tissue samples enable patient stratification, prognostic assessments and pharmacological studies that would be impractical to perform by acquiring large numbers of trial participants

While in vitro studies lay the foundation for a biochemical, molecular and genetic understanding of the biology of diseases, human tissue samples provide a source of information from which fundamental knowledge is transformed into actionable information.

Related publications

Conversant Bio. Well-Annotated Tissue Samples: An Essential Part of Drug Discovery.

Roswell Park Cancer Institute Blog. The Importance of Tissue Samples in Research.

McDonald, 2010. Principles of Research Tissue Banking and Specimen Evaluation from the Pathologist’s Perspective.