Evaluating the functional impact of genetic variants with COPE software


Dr Ge Gao, developer of the COPE-PCG software tool, talks here about his tool and how it can assist researchers to analyze sequencing data.

COPE: A new framework of context-oriented prediction for variant effects

Evaluating functional impacts of genetic variants is a key step in genomic studies. Whilst most popular variant annotation tools take a variant-centric strategy and assess the functional consequence of each variant independently, multiple variants in the same gene may interfere with each other and have different effects in combination than individually (e.g., a frameshift caused by an indel can be “rescued” by another downstream variant). The COPE framework, Context-Oriented Predictor for variant Effect, was developed to accurately annotate multiple co-occurring variants.

This new gene-centric annotation tool integrates the entire sequence context to evaluate the bona fide impact of multiple intra-genic variants in a context-sensitive approach.

COPE handles complex effects of multiple variants

Unlike the current variant-centric approach that assesses the functional consequence of each variant independently, COPE takes each functional element as the basic annotation unit and considers that multiple variants in the same functional element may interfere with each other and have different effects in combination than individually (complementary rescue effect).

cope-fig1overview-omictoolsOverview of COPE: COPE uses each transcript as a basic annotation unit. The variant mapping step identifies variants within transcripts. The coding region inference step removes introns from each transcript; all possible splicing patterns are taken into consideration for splice-altering transcripts (in this case, the red dot indicates a splice acceptor site SNP, and intron retention and exon skipping are taken into consideration). The sequence comparison step compares a ‘mutant peptide’ against a reference protein sequence to obtain the final amino acid alteration.

Applying COPE software to genomic data

Screening the official 1000 Genomes variant set, COPE identified a considerable number of false-positive Loss-of-Function calls for 23.21% splice-disrupting variants, 6.45% frameshift indels and 2.10% stop-gained variants, as well as several false-negative Loss-of-Function variants in 38 genes.

To the best of our knowledge, COPE is the first fully gene-centric tool for annotating the effects of variants in a context-sensitive approach.


Schematic diagram of typical types of annotation corrections implemented in COPE. A rescued stop-gained SNV indicates that another SNV (‘A’ to ‘C’) in the same codon rescues a variant-centric stop-gained SNV (‘A’ to ‘T’). Stop-gained MNV indicates that two or more SNVs result in a stop codon (‘A’ to ‘T’ and ‘C’ to ‘G’). A rescued frameshift indel indicates that another indel in the same haplotype recovers the original open reading frame. A splicing-rescued stop-gained/frameshift variant indicates that a stop-gained or frameshift variant is rescued by a novel splicing isoform. A rescued splice-disrupting variant indicates that a splice-disrupting variant is rescued by a nearby cryptic site (as shown in the figure) or a novel splice site. The asterisk in the figure indicates a stop codon.

Evaluating the quality of COPE: availability, usability and flexibility

  • Free software
  • Publically available online server and stand-alone package for large-scale analysis


Screenshot of the COPE web server. Example of input (A) and annotation by COPE (B)

  • Software documentation: A detailed guideline for installation and setup is available
  • Recent updates: COPE-PCG has been online since June 2016, and COPE-TFBS since March 2017 on a new website
  • Analysis of protein-coding genes (COPE-PCG), transcription factor binding sites (COPE-TFBS) and more… the COPE framework may also be extended and adapted to non-coding RNAs and miRNAs in a near future.

About the author

Dr Ge Gao is principal investigator at the Center for Bioinformatics of Peking University. His team focuses primarily on developing novel computational techniques to analyze, integrate and visualize high-throughput biological data effectively and efficiently, with applications for deciphering the function and evolution of gene regulatory system. Dr Ge Gao is specialized in large-scale data mining, using a combination of statistical learning, high-performance computing, and data visualizing.


Cheng et al., 2017. Accurately annotate compound effects of genetic variants using a context-sensitive framework. Nucleic Acids Research.

Cheng et al., in preparation. Systematically identify and annotate multiple-variant compound effect at transcription factor binding sites in the human genome.

Your Top CRISPR/Cas9 software tools


The development of CRISPR-Cas9 systems has revolutionized genome engineering in living organisms. This novel technology opens up a new era in genomics, along with a wide range of applications. Several bioinformatics tools have recently been developed for researchers designing CRISPR/Cas9 experiments, and analyzing and evaluating CRISPR/Cas9 genome editing.

A few weeks ago, we asked OMICtools members to choose their top 3 CRISPR/Cas9 favorite tools among those most used by the scientific community. Here are the results of your votes. 

Gold medals for CRISPR-GA, CROP-IT and CRISPRTarget tools

Three web applications came out equally on top – each voted as a number #1 tool by 45% of the users surveryed: CRISPR-GA (CRISPR Genome Analyzer), CROP-IT (CRISPR/Cas9 Off-target Prediction and Identification Tool) and CRISPRTarget.

The CRISPR-GA platform has become an essential tool for anyone wanting to assess the quality of their CRISPR/Cas9 experiment. It provides an easy (three mouse clicks), sensitive (detection limit 50.1%), and comprehensive analysis of gene editing results. The CRISPR-GA platform maps the reads, it estimates and locates insertions and deletions, computes the allele replacement efficiency, and then provides you with a report integrating all this information.

crispr-ga-Fig-omictoolsCRISPR-GA pipeline. (A) From experiment to report. Schematic pipeline of a gene editing assessment. (B) Output of CRISPR-GA estimating a range of information. Deletions, insertions, homologous recombination (HR) and corresponding efficiencies. Upper panels estimate the number of insertions and deletions and each corresponding size. Middle panels estimate the number of insertions and deletions, and their corresponding location within the genomic locus of interest. The bottom panel shows the number of deletions and HRs at each corresponding location, and outputs the HR and NHEJ (non-homologous end-joining) efficiency. (C) Experimental results assessed by CRISPR-GA from testing several mutants of cas9, gRNAs and a DNA template. HR and NHEJ values are shown. From Güell et al., 2014. Genome editing assessment using CRISPR Genome Analyzer (CRISPR-GA).  Bioinformatics.

  • CROP-IT (CRISPR/Cas9 Off-Target Prediction and Identification Tool)

CROP-IT is a userfriendly web application where users can design optimal sgRNA guiding sequences and can search for potential off-target binding or cleavage sites. The CROP-IT tool integrates knowledge from experimentally identified Cas9 binding sites, cleavage sites as well as information on chromatin state (data from multiple studies and 125 cell types). CROP-IT scores predict off-target binding and cleavage Cas9 sites and outputs a list of the top sites.


Schematic of CROP-IT algorithm based on a computational model where each position of the guiding RNA sequence is differentially weighted based on experimental Cas9 binding and cleavage site information from multiple independent sources. Furthermore, it incorporates chromatin state information for the human genome by analyzing accessible chromatin regions from 125 human cell types. By integrating observed information from Cas9 DNA binding, CROP-IT performs significantly better than existing computational prediction tools. From Singh et al., 2015. Cas9-chromatin binding information enables more accurate CRISPR off-target prediction. Nucleic Acids Research. 

CRISPRTarget is one of the first tools developed for predicting the targets of CRISPR RNA spacers. This web application interactively explores diverse databases. CRISPTarget provides the flexibility to search for matches in either or both orientations of the input, and to discover targets with protospacer adjacent motifs, as well as any adjacent pairing potential.


Graphical output of CRISPRTarget. Output of a search for targets of the Streptomyces thermophilus DGCC7710 CRISPR array. The direction of transcription is known, however both strands are shown in the diagram as if the direction of transcription was unknown. Two relatively low-scoring matches using these interactive settings are shown (rank 44–45). They have good spacer-protospacer base pairing but lack a WTTCTNN PAM. Match 45 is a match to a phage to which this strain is sensitive (Φ2972). Yellow indicates spacer/protospacer, blue shows flanking sequences, and mismatches between the crRNA and the target DNA protospacer are indicated in red. From Biswas et al., 2013. CRISPRTarget: bioinformatic prediction and analysis of crRNA targets. RNA Biology.

Silver medal for ZiFit

Second place went to ZiFiT (Zinc Finger Targeter v4.1), with 36% of the votes.

Originally developed to identify potential zinc finger nuclease (ZFN) sites in target sequences, ZiFiT also provides support for the identification of CRISPR/Cas target sites and reagents as well as a user-friendly guidance for construction of TALEN-encoding plasmids.

(Sander et al., 2010. ZiFiT (Zinc Finger Targeter): an updated zinc finger engineering tool. Nucleic Acids Research.)

Bronze medals for Crass and MAGeCK tools

Equal third place went to Crass (CRISPR Assembler) and MAGeCK (Model-based Analysis of Genome-wide CRISPR/Cas9 Knockout), with 31% of the votes each.

  • Crass (CRISPR Assembler)

Crass identifies and reconstructs CRISPR loci and spacers from raw metagenomic data without the need for assembly or prior knowledge of CRISPR in the data set. The sensitivity, specificity and speed of Crass facilitates analysis of metagenomic data, phage-host interactions and co-evolution within microbial communities.


Comparison between different CRISPR loci visualization techniques. (A) Traditional approach to visualization where the spacers are shown as differently colored rectangles (the same color refers to the same spacer) anchored to the leader sequence (white triangle). (B) The same CRISPR loci reconstructed by Crass into a spacer graph. From Skennerton et al., 2013. Crass: identification and reconstruction of CRISPR from unassembled metagenomic data. Nucleic Acids Res.

  • MAGeCK (Model-based Analysis of Genome-wide CRISPR/Cas9 Knockout)

The MAGeCK algorithm was developed by Li et al. (Genome Biol. 2014) for prioritizing single-guide RNAs, genes and pathways in genome-scale CRISPR/Cas9 knockout screens. It identifies both positively and negatively selected genes simultaneously, and reports robust results across different experimental conditions. This computational method, with a low false discovery rate (FDR) and high sensitivity, brings new clues for answering biological questions and addressing therapeutic needs. 

Follow this tutorial to see how the MAGeCK algorithm works.

Stay tuned for more feedback from the OMICtools community on the latest and best tools to use!

Analyzing population genomics using ENDOG software

Dr. Juan Pablo Gutiérrez, developer of the ENDOG software, talks here about his tool and how you can use it to perform various demographic and genealogical analyses of genomics data.


ENDOG: one of the most popular softwares for genealogical analysis

The program ENDOG has become one of the most popular software tools  for genealogical analyses. It includes not only computation of classical parameters in population genetics but also new parameters based on computation of individual increase of inbreeding or co-ancestry.


Inbreeding per Generation submenu screen

ENDOG allows you to conduct several demographic and genetic analyses including:

  • Individual inbreeding and average relatedness coefficients
  • Effective population size
  • Parameters characterizing the concentration of both gene and individuals origin, such as the effective number of founders and ancestors, the effective number of founder herds, etc
  • F-statistics and paired genetic distances for each subpopulation under study
  • Descriptors of the genetic importance of herds in a population
  • Generation intervals

The program helps breeders and researchers to monitor changes in genetic variability and population structure, with limited costs from preparing datasets.

The current version of ENDOG calculates effective population size following various methodologies including regression approaches, and in particular calculation from individual increase in inbreeding and modified to account for avoidance of self-fertilization.


ENDOG Individual Pedigree submenu screen

Highlights of the ENDOG program

Why has ENDOG become a popular software for scientists and breeders?

Using ENDOG for your genetic analyses allows you to fit your data to real-world populations. It is specifically designed for analyzing diploid populations in which selfing is not possible.

It allows you to compute reliable genetic parameters, particularly effective population sizes even when pedigrees are shallow (have accumulated, on average three, or more generations). The authors have also made available a version of the program in which selfing is possible to allow plant breeders to carry out genealogical analyses.

What about ENDOG quality: availability, usability and flexibility?

  • Free software: ENDOG is a freely available software. You can download here the latest version 4.8 (10 November 2010, in English), an additional file for the Selfing Version (28 September 2011), and Endog 4.0 (15 November 2006, in Spanish).
  • Intuitive: Users can upload bulk pedigree data with a limited need for formatting. ENDOG provides tools to help users check for errors. Interface, Access Tables and .txt files generated by ENDOG are user-friendly and self-informative.
  • Software documentation: The ENDOG users’ guide provides information on the methods implemented in the software, but also gives tips to help any users trying out the software for the first time via the ENDOG interface. In addition, the authors are known to be very responsive to users’ questions.
  • A compiled version of ENDOG is available for the Microsoft Windows environment only.
  • Tested in several studies: The ENDOG program has been cited 191 times in papers indexed in the Web of Knowledge.


ENDOG Founders submenu screen

 About the author:

Dr. Juan Pablo Gutiérrez, DVM, graduated from the UCM (Complutense University of Madrid) in Spain at the School of Veterinary Medicine in 1987, and completed a post-graduate degree at the same University in 1991. He  also completed a degree in computer engineering at the UNED (National University of Distance Education) in Spain, with a specialization in Animal Breeding in 1989.

He is currently a Full Professor at the Department of Animal Production at the UCM, and the Director of the UCM Consolidated Research Group MOSEVAR (Animal Selection and Genetic Evaluation Models). As of July 2017, he has 30 years of experience in the field of animal breeding, and has published approximately 200 research papers, 91 of which are published in journals appearing in JCR (Journal Citation Reports). He has worked in genetic evaluation in multiple species including many breeds of cattle, sheep, mice, alpacas and horses.

Felix Goyache and Isabel Cervantes are co-authors.


(Gutiérrez and Goyache, 2005) A note on ENDOG: a computer program for analysing pedigree information. Journal of Animal Breeding and Genetics.