6 ways biorepositories support clinical research


We are pleased to publish this guest blog post from Geneticist Inc.

Biorepositories help research institutions by providing tissue samples for clinical studies. Human tissue samples play a critical role in disease research by enabling assessments of molecular expression, prediction of toxicity, and identification of biomarkers. They help clarify and expand field-of-use claims, selection of appropriate species for preclinical studies and they assist in the clinical trial stages of drug development. Below is a list of key areas where availability of tissues (both from humans and preclinical species) can support pharmaceutical and other research.

Assessment of Molecular Expression

Biobanks contain vast libraries of human tissue samples, allowing for the assessment of expression levels of biological target molecules such as proteins and RNA. Methods include immunohistochemistry, in situ hybridization, western blotting, PCR and tissue microarrays, all of which can be applied to both normal and diseased tissues to assess expression levels of target molecules. Determining expression levels in a large volume of tissue samples provides critical information to drug developer, allowing an assessment of the appropriateness of a potential drug target. The exclusion of inappropriate drug targets saves millions in funding and years of wasted research.

FFPE DNA and RNA analysis leads the way as a source of comprehensive tissue information. It enables the stratification of tissues, thus advancing our understanding of heterogeneous diseases like cancer that were previously treated without an appreciation of their inherent molecular heterogeneity.

Toxicity Predictions

By illuminating altered levels of target molecule expression in organs and tissues outside of those targeted by a drug, data gathered from testing tissue samples can warn researchers of unanctipated toxicity in drugs under development.

Biomarker Studies

In addition to assessing expression levels, human tissue samples provide an excellent source for the identification and clarification of biomarkers. Well-annotated tissues offer an opportunity for disease stratification that can help identify appropriate personalized therapy for patients exhibiting similar biomarker profiles.

Field of Use Claims

By accurately identifying drug targets in tissue samples, targets can then be searched for in well-classified samples from patients with different diseases. The enormous quantities of well-annotated FFPE blocks could serve as a means to expand the use of existing drugs for diseases that exhibit similarities in biomarkers.

Preclinical Species Selection

The selection of appropriate species for preclinical evaluation of pipeline drugs can be aided by tissue procurement from biorepository collections, particularly procurement of FFPE tissue. This is done by analyzing differences and selecting species with the most similar target compound expression profiles, as determined by tissue arrays. Efficiently modeling human diseases assists drug developers to avoid investigating costly, dead-end avenues, testing compounds in preclinical stages that will prove ineffective or toxic in human trials.

Clinical Trials

Once drug development reaches the clinical stage, embedded tissue blocks can continue to play a critical role in furthering research. Tissue samples enable patient stratification, prognostic assessments and pharmacological studies that would be impractical to perform by acquiring large numbers of trial participants

While in vitro studies lay the foundation for a biochemical, molecular and genetic understanding of the biology of diseases, human tissue samples provide a source of information from which fundamental knowledge is transformed into actionable information.

Related publications

Conversant Bio. Well-Annotated Tissue Samples: An Essential Part of Drug Discovery.

Roswell Park Cancer Institute Blog. The Importance of Tissue Samples in Research.

McDonald, 2010. Principles of Research Tissue Banking and Specimen Evaluation from the Pathologist’s Perspective.


Evaluating the functional impact of genetic variants with COPE software


Dr Ge Gao, developer of the COPE-PCG software tool, talks here about his tool and how it can assist researchers to analyze sequencing data.

COPE: A new framework of context-oriented prediction for variant effects

Evaluating functional impacts of genetic variants is a key step in genomic studies. Whilst most popular variant annotation tools take a variant-centric strategy and assess the functional consequence of each variant independently, multiple variants in the same gene may interfere with each other and have different effects in combination than individually (e.g., a frameshift caused by an indel can be “rescued” by another downstream variant). The COPE framework, Context-Oriented Predictor for variant Effect, was developed to accurately annotate multiple co-occurring variants.

This new gene-centric annotation tool integrates the entire sequence context to evaluate the bona fide impact of multiple intra-genic variants in a context-sensitive approach.

COPE handles complex effects of multiple variants

Unlike the current variant-centric approach that assesses the functional consequence of each variant independently, COPE takes each functional element as the basic annotation unit and considers that multiple variants in the same functional element may interfere with each other and have different effects in combination than individually (complementary rescue effect).

cope-fig1overview-omictoolsOverview of COPE: COPE uses each transcript as a basic annotation unit. The variant mapping step identifies variants within transcripts. The coding region inference step removes introns from each transcript; all possible splicing patterns are taken into consideration for splice-altering transcripts (in this case, the red dot indicates a splice acceptor site SNP, and intron retention and exon skipping are taken into consideration). The sequence comparison step compares a ‘mutant peptide’ against a reference protein sequence to obtain the final amino acid alteration.

Applying COPE software to genomic data

Screening the official 1000 Genomes variant set, COPE identified a considerable number of false-positive Loss-of-Function calls for 23.21% splice-disrupting variants, 6.45% frameshift indels and 2.10% stop-gained variants, as well as several false-negative Loss-of-Function variants in 38 genes.

To the best of our knowledge, COPE is the first fully gene-centric tool for annotating the effects of variants in a context-sensitive approach.


Schematic diagram of typical types of annotation corrections implemented in COPE. A rescued stop-gained SNV indicates that another SNV (‘A’ to ‘C’) in the same codon rescues a variant-centric stop-gained SNV (‘A’ to ‘T’). Stop-gained MNV indicates that two or more SNVs result in a stop codon (‘A’ to ‘T’ and ‘C’ to ‘G’). A rescued frameshift indel indicates that another indel in the same haplotype recovers the original open reading frame. A splicing-rescued stop-gained/frameshift variant indicates that a stop-gained or frameshift variant is rescued by a novel splicing isoform. A rescued splice-disrupting variant indicates that a splice-disrupting variant is rescued by a nearby cryptic site (as shown in the figure) or a novel splice site. The asterisk in the figure indicates a stop codon.

Evaluating the quality of COPE: availability, usability and flexibility

  • Free software
  • Publically available online server and stand-alone package for large-scale analysis


Screenshot of the COPE web server. Example of input (A) and annotation by COPE (B)

  • Software documentation: A detailed guideline for installation and setup is available
  • Recent updates: COPE-PCG has been online since June 2016, and COPE-TFBS since March 2017 on a new website
  • Analysis of protein-coding genes (COPE-PCG), transcription factor binding sites (COPE-TFBS) and more… the COPE framework may also be extended and adapted to non-coding RNAs and miRNAs in a near future.

About the author

Dr Ge Gao is principal investigator at the Center for Bioinformatics of Peking University. His team focuses primarily on developing novel computational techniques to analyze, integrate and visualize high-throughput biological data effectively and efficiently, with applications for deciphering the function and evolution of gene regulatory system. Dr Ge Gao is specialized in large-scale data mining, using a combination of statistical learning, high-performance computing, and data visualizing.


Cheng et al., 2017. Accurately annotate compound effects of genetic variants using a context-sensitive framework. Nucleic Acids Research.

Cheng et al., in preparation. Systematically identify and annotate multiple-variant compound effect at transcription factor binding sites in the human genome.

Your Top CRISPR/Cas9 software tools


The development of CRISPR-Cas9 systems has revolutionized genome engineering in living organisms. This novel technology opens up a new era in genomics, along with a wide range of applications. Several bioinformatics tools have recently been developed for researchers designing CRISPR/Cas9 experiments, and analyzing and evaluating CRISPR/Cas9 genome editing.

A few weeks ago, we asked OMICtools members to choose their top 3 CRISPR/Cas9 favorite tools among those most used by the scientific community. Here are the results of your votes. 

Gold medals for CRISPR-GA, CROP-IT and CRISPRTarget tools

Three web applications came out equally on top – each voted as a number #1 tool by 45% of the users surveryed: CRISPR-GA (CRISPR Genome Analyzer), CROP-IT (CRISPR/Cas9 Off-target Prediction and Identification Tool) and CRISPRTarget.

The CRISPR-GA platform has become an essential tool for anyone wanting to assess the quality of their CRISPR/Cas9 experiment. It provides an easy (three mouse clicks), sensitive (detection limit 50.1%), and comprehensive analysis of gene editing results. The CRISPR-GA platform maps the reads, it estimates and locates insertions and deletions, computes the allele replacement efficiency, and then provides you with a report integrating all this information.

crispr-ga-Fig-omictoolsCRISPR-GA pipeline. (A) From experiment to report. Schematic pipeline of a gene editing assessment. (B) Output of CRISPR-GA estimating a range of information. Deletions, insertions, homologous recombination (HR) and corresponding efficiencies. Upper panels estimate the number of insertions and deletions and each corresponding size. Middle panels estimate the number of insertions and deletions, and their corresponding location within the genomic locus of interest. The bottom panel shows the number of deletions and HRs at each corresponding location, and outputs the HR and NHEJ (non-homologous end-joining) efficiency. (C) Experimental results assessed by CRISPR-GA from testing several mutants of cas9, gRNAs and a DNA template. HR and NHEJ values are shown. From Güell et al., 2014. Genome editing assessment using CRISPR Genome Analyzer (CRISPR-GA).  Bioinformatics.

  • CROP-IT (CRISPR/Cas9 Off-Target Prediction and Identification Tool)

CROP-IT is a userfriendly web application where users can design optimal sgRNA guiding sequences and can search for potential off-target binding or cleavage sites. The CROP-IT tool integrates knowledge from experimentally identified Cas9 binding sites, cleavage sites as well as information on chromatin state (data from multiple studies and 125 cell types). CROP-IT scores predict off-target binding and cleavage Cas9 sites and outputs a list of the top sites.


Schematic of CROP-IT algorithm based on a computational model where each position of the guiding RNA sequence is differentially weighted based on experimental Cas9 binding and cleavage site information from multiple independent sources. Furthermore, it incorporates chromatin state information for the human genome by analyzing accessible chromatin regions from 125 human cell types. By integrating observed information from Cas9 DNA binding, CROP-IT performs significantly better than existing computational prediction tools. From Singh et al., 2015. Cas9-chromatin binding information enables more accurate CRISPR off-target prediction. Nucleic Acids Research. 

CRISPRTarget is one of the first tools developed for predicting the targets of CRISPR RNA spacers. This web application interactively explores diverse databases. CRISPTarget provides the flexibility to search for matches in either or both orientations of the input, and to discover targets with protospacer adjacent motifs, as well as any adjacent pairing potential.


Graphical output of CRISPRTarget. Output of a search for targets of the Streptomyces thermophilus DGCC7710 CRISPR array. The direction of transcription is known, however both strands are shown in the diagram as if the direction of transcription was unknown. Two relatively low-scoring matches using these interactive settings are shown (rank 44–45). They have good spacer-protospacer base pairing but lack a WTTCTNN PAM. Match 45 is a match to a phage to which this strain is sensitive (Φ2972). Yellow indicates spacer/protospacer, blue shows flanking sequences, and mismatches between the crRNA and the target DNA protospacer are indicated in red. From Biswas et al., 2013. CRISPRTarget: bioinformatic prediction and analysis of crRNA targets. RNA Biology.

Silver medal for ZiFit

Second place went to ZiFiT (Zinc Finger Targeter v4.1), with 36% of the votes.

Originally developed to identify potential zinc finger nuclease (ZFN) sites in target sequences, ZiFiT also provides support for the identification of CRISPR/Cas target sites and reagents as well as a user-friendly guidance for construction of TALEN-encoding plasmids.

(Sander et al., 2010. ZiFiT (Zinc Finger Targeter): an updated zinc finger engineering tool. Nucleic Acids Research.)

Bronze medals for Crass and MAGeCK tools

Equal third place went to Crass (CRISPR Assembler) and MAGeCK (Model-based Analysis of Genome-wide CRISPR/Cas9 Knockout), with 31% of the votes each.

  • Crass (CRISPR Assembler)

Crass identifies and reconstructs CRISPR loci and spacers from raw metagenomic data without the need for assembly or prior knowledge of CRISPR in the data set. The sensitivity, specificity and speed of Crass facilitates analysis of metagenomic data, phage-host interactions and co-evolution within microbial communities.


Comparison between different CRISPR loci visualization techniques. (A) Traditional approach to visualization where the spacers are shown as differently colored rectangles (the same color refers to the same spacer) anchored to the leader sequence (white triangle). (B) The same CRISPR loci reconstructed by Crass into a spacer graph. From Skennerton et al., 2013. Crass: identification and reconstruction of CRISPR from unassembled metagenomic data. Nucleic Acids Res.

  • MAGeCK (Model-based Analysis of Genome-wide CRISPR/Cas9 Knockout)

The MAGeCK algorithm was developed by Li et al. (Genome Biol. 2014) for prioritizing single-guide RNAs, genes and pathways in genome-scale CRISPR/Cas9 knockout screens. It identifies both positively and negatively selected genes simultaneously, and reports robust results across different experimental conditions. This computational method, with a low false discovery rate (FDR) and high sensitivity, brings new clues for answering biological questions and addressing therapeutic needs. 

Follow this tutorial to see how the MAGeCK algorithm works.

Stay tuned for more feedback from the OMICtools community on the latest and best tools to use!