Bioinformatics and Systems Medicine Laboratory

In this page, we listed computational methods pertaining to CNV detection using whole-genome and whole-exome sequencing data for users to find the right tools for calling CNVs. These CNVs can be inputted into our CNVannotator for further annotation.

As summarized in Figure 1 below, the NGS-based CNV detection methods can be categorized into five different strategies, including: (1) paired-end mapping (PEM), (2) split read (SR), (3) read depth (RD), (4) de novo assembly of a genome (AS), and (5) a combination of the above approaches (CB) .

The paired-end mapping (PEM) strategy detects CNVs through discordantly mapped reads. A discordant mapping is produced if the distance between two ends of a read pair is significantly different from the average insert size. By contrast, the split read (SR)-based methods use incompletely mapped read from each read pair to identify small CNVs. The read depth (RD)-based approach detects CNV by counting the number of reads mapped to each genomic region. In the figure, reads are mapped to three exome regions. The assembly (AS)-based approach detects CNVs by mapping contigs to the reference genome. The combinatorial approach often combine the results from the above four methods, here we make an example to combines RD and PEM information to detect CNVs. Indeed, different strategies have their own advantages and limitations. Though there has been great progress in each category, none of the methods could comprehensively detect all types of CNVs. As summarized in Tables 1-4 below, there are 6 PEM-based tools, 4 SR-based tools, 26 RD-based tools, 3 AS-based tools, and 9 tools for combinatorial approaches.


Table 1 - Summary of paired-end mapping (PEM), split read (SR), and de novo assembly (AS)-based tools for CNV detection using NGS data

Method

URL

Language

Input

Comments

Ref.

PEM-based

         

BreakDancer

http://breakdancer.sourceforge.net/

Perl, C++

Alignment files

Predicting insertions, deletions, inversions, inter- and intra-chromosomal translocations

[1]

PEMer

http://sv.gersteinlab.org/pemer/

Perl, Python

FASTA

Using simulation-based error models to call SVs

[2]

VariationHunter

http://compbio.cs.sfu.ca/strvar.htm

C

DIVETa

Detecting insertions, deletions and inversions

[3]

commonLAW

http://compbio.cs.sfu.ca/strvar.htm

C++

Alignment files

Aligning multiple samples simultaneously to gain accurate SVs using maximum parsimony model

[4]

GASV

http://code.google.com/p/gasv/

Java

BAM

A geometric approach for classification and comparison of structural variants

[5]

Spanner

N/A

N/A

N/A

Using PEM to detect tandem duplications

[6]

SR-based

         

AGE

http://sv.gersteinlab.org/age

C++

FASTA

A dynamic-programming algorithm using optimal alignments with gap excision to detect breakpoints

[7]

Pindel

http://www.ebi.ac.uk/~kye/pindel/

C++

BAM / FASTQ

Using a pattern growth approach to identify breakpoints of various SVs

[8]

SLOPE

http://www-genepi.med.utah.edu/suppl/SLOPE

C++

SAM/ FASTQ/ MAQb

Locating SVs from targeted sequencing data

[9]

SRiC

N/A

N/A

BLAT output

CalibratingSV calling using realistic error models

[10]

AS-based

         

Magnolya

http://sourceforge.net/projects/magnolya/

Python

FASTA

Calling CNV from co-assembled genomes and estimating copy number with Poisson mixture model

[11]

Cortex assembler

http://cortexassembler.sourceforge.net/

C

FASTQ / FASTA

Using alignment of de novo assembled genome to build de Bruijn graph to detect SVs

[12]

TIGRA-SV

http://gmt.genome.wustl.edu/tigra-sv/

C

SV callsc + BAM

Local assembly of SVs using the iterative graph routing assembly (TIGRA) algorithm

N/A

aThe specific input format for VariationHunter, including the reads with multiple alignments.

bFile format from MAQ mapview.

cThe file including the detected structure variations using other tools.


Table 2 - Read depth (RD)-based tools for CNV detection using whole genome sequencing data

Tool

URL

Language

Input

Comments

Ref.

SegSeqa

http://www.broadinstitute.org/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&paper_id=182

Matlab

Aligned read positions

Detecting CNV breakpoints using massively parallel sequence data

[13]

CNV-seqa

http://tiger.dbs.nus.edu.sg/cnv-seq/

Perl, R

Aligned read positions

Identifying CNVs using the difference of observed copy number ratios

[14]

RDXplorerb

http://rdxplorer.sourceforge.net/

Python, Shell

BAM

Detecting CNVs through event-wise testing algorithm on normalized read depth of coverage

[15]

BIC-seqa

http://compbio.med.harvard.edu/Supplements/PNAS11.html

Perl, R

BAM

Using the Bayesian information criterion to detect CNVs based on uniquely mapped reads

[16]

CNAsega

http://www.compbio.group.cam.ac.uk/software/cnaseg

R

BAM

Using flowcell-to-flowcell variability in cancer and control samples to reduce false positives

[17]

cn.MOPSb

http://www.bioinf.jku.at/software/cnmops/

R

BAM/read count matrices

Modelling of read depths across samples at each genomic position using mixture Poisson model

[18]

JointSLMb

http://nar.oxfordjournals.org/content/suppl/2011/02/16/
gkr068.DC1/JointSLM_R_Package.zip

R

SAM/BAM

Population-based approach to detect common CNVs using read depth data

[19]

ReadDepth

http://code.google.com/p/readdepth/

R

BED files

Using breakpoints to increase the resolution of CNV detection from low-coverage reads

[20]

rSW-seqa

http://compbio.med.harvard.edu/Supplements/BMCBioinfo10-2.html

C

Aligned read positions

Identifying CNVs by comparing matched tumor and control sample

[21]

CNVnator

http://sv.gersteinlab.org/

C++

BAM

Using mean-shift approach and performing multiple-bandwidth partitioning and GC correction

[22]

CNVnorma

http://www.precancer.leeds.ac.uk/cnanorm

R

Aligned read positions

Identifying contamination level with normal cells

[23]

CMDSb

https://dsgweb.wustl.edu/qunyuan/software/cmds

C, R

Aligned read positions

Discovering CNVs from multiple samples

[24]

mrCaNaVar

http://mrcanavar.sourceforge.net/

C

SAM

A tool to detect large segmental duplications and insertions

[25]

CNVeM

N/A

N/A

N/A

Predicting CNV breakpoints in base-pair resolution

[26]

cnvHMM

http://genome.wustl.edu/software/cnvhmm

C

consensus sequence from SAMtools

Using HMM to detect CNV

N/A

aTools require matched case-control sample as input.

bTools use multiple samples as input.


Table 3 - Summary of bioinformatics tools for CNV detection using exome sequencing data

Tool

URL

Language

Input

Comments

Ref.

Control-FREECa

http://bioinfo-out.curie.fr/projects/freec/

C++

SAM/BAM/pileup/ Eland, BED, SOAP, arachne, psi (BLAT) and Bowtie formats

Correcting copy number using matched case-control samples or GC contents

[27]

CoNIFERb

http://conifer.sf.net/

Python

BAM

Using singular value decomposition to normalize copy number and avoiding batch bias by integrating multiple samples

[28]

XHMMb

http://atgu.mgh.harvard.edu/xhmm/

C++

BAM

Uses principal component analysis to normalize copy number and HMM to detect CNVs

[29]

ExomeCNVc

http://cran.r-project.org/web/packages/ExomeCNV

R

BAM/pileup

Using read depth and B-allele frequencies from exome sequencing data to detect CNVs and LOHs

[30]

CONTRAc

http://contra-cnv.sourceforge.net/

Python

SAM/BAM

Comparing base-level log-ratios calculated from read depth between case and control samples

[31]

CONDEX

http://code.google.com/p/condr/

Java

Sorted BED files

Using HMM to identify CNVs

[32]

SeqGene

http://seqgene.sourceforge.net

Python, R

SAM/pileup

Calling variants, including CNVs, from exome sequencing data

[33]

PropSeqc

http://bioinformatics.nki.nl/ocs/

R, C

N/A

Using the read depth of the case sample as a linear function of that of control sample to detect CNVs

[34]

VarScan2c

http://genome.wustl.edu/software/varscan

Java

BAM/pileup

Using pairwise comparisons of the normalized read depth at each position to estimate CNV

[35]

ExoCNVTestb

http://www1.imperial.ac.uk/medicine/people/l.coin/

Java, R

BAM

Identifying and genotyping common CNVs associated with complex disease

[36]

ExomeDepthb

http://cran.r-project.org/web/packages/ExomeDepth/index.html

R

BAM

Using beta-binomial model to fit read depth of WES data

[37]

aControl-FREEC accepts either matched case-control samples or single sample as input.

bTools use multiple samples as input.

cTools require matched case-control samples as input.


Table 4 - Combinatorial bioinformatics tools for CNV detection using NGS data

Method

URL

Language

Input

Combinationa

Ref.

NovelSeq

http://compbio.cs.sfu.ca/strvar.htm

C

FASTA/SAM

PEM+AS

[38]

HYDRA

http://code.google.com/p/hydra-sv/

Python

discordant paired-end mappings

PEM+AS

[39]

CNVer

http://compbio.cs.toronto.edu/CNVer/

Perl, C++

BAM/ aligned positions

PEM+RD

[40]

GASVPro

http://code.google.com/p/gasv/

C++

BAM

PEM+RD

[41]

Genome STRiP

http://www.broadinstitute.org/software/
genomestrip/genome-strip

Java, R

BAM

PEM+RD

[42]

SVDetect

http://svdetect.sourceforge.net/

Perl

SAM/BAM/ ELAND

PEM+RD

[43]

inGAP-sv

http://ingap.sourceforge.net/

Java

SAM

PEM+RD

[44]

SVseq

http://www.engr.uconn.edu/~jiz08001/svseq.html

C

FASTQ / BAM

PEM+SR

[45]

Nord et al.

N/A

N/A

N/A

RD+SR

[46]

aRD: read depth-based approach; PEM: paired-end mapping approach; SR: split read approach; AS: de novo assembly approach.


Reference

1.         Chen K, Wallis JW, McLellan MD, Larson DE, Kalicki JM, Pohl CS, McGrath SD, Wendl MC, Zhang QY, Locke DP, et al: BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods 2009, 6:677- 681.

2.         Korbel JO, Abyzov A, Mu XJ, Carriero N, Cayting P, Zhang ZD, Snyder M, Gerstein MB: PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biol 2009, 10:R23.

3.         Hormozdiari F, Hajirasouliha I, Dao P, Hach F, Yorukoglu D, Alkan C, Eichler EE, Sahinalp SC: Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery. Bioinformatics 2010, 26:i350-357.

4.         Hormozdiari F, Hajirasouliha I, McPherson A, Eichler EE, Sahinalp SC: Simultaneous structural variation discovery among multiple paired-end sequenced genomes. Genome Res 2011, 21:2203-2212.

5.         Sindi S, Helman E, Bashir A, Raphael BJ: A geometric approach for classification and comparison of structural variants. Bioinformatics 2009, 25:i222-230.

6.         Mills RE, Walter K, Stewart C, Handsaker RE, Chen K, Alkan C, Abyzov A, Yoon SC, Ye K, Cheetham RK, et al: Mapping copy number variation by population-scale genome sequencing. Nature 2011, 470:59-65.

7.         Abyzov A, Gerstein M: AGE: defining breakpoints of genomic structural variants at single-nucleotide resolution, through optimal alignments with gap excision. Bioinformatics 2011, 27:595-603.

8.         Ye K, Schulz MH, Long Q, Apweiler R, Ning Z: Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 2009, 25:2865-2871.

9.         Abel HJ, Duncavage EJ, Becker N, Armstrong JR, Magrini VJ, Pfeifer JD: SLOPE: a quick and accurate method for locating non-SNP structural variation from targeted next-generation sequence data. Bioinformatics 2010, 26:2684-2688.

10.       Zhang ZD, Du J, Lam H, Abyzov A, Urban AE, Snyder M, Gerstein M: Identification of genomic indels and structural variations using split reads. BMC Genomics 2011, 12:375.

11.       Nijkamp JF, van den Broek MA, Geertman JM, Reinders MJ, Daran JM, de Ridder D: De novo detection of copy number variation by co-assembly. Bioinformatics 2012.

12.       Iqbal Z, Caccamo M, Turner I, Flicek P, McVean G: De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet 2012, 44:226-232.

13.       Chiang DY, Getz G, Jaffe DB, O'Kelly MJ, Zhao X, Carter SL, Russ C, Nusbaum C, Meyerson M, Lander ES: High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat Methods 2009, 6:99-103.

14.       Xie C, Tammi MT: CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics 2009, 10:80.

15.       Yoon S, Xuan Z, Makarov V, Ye K, Sebat J: Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res 2009, 19:1586-1592.

16.       Xi R, Hadjipanayis AG, Luquette LJ, Kim TM, Lee E, Zhang J, Johnson MD, Muzny DM, Wheeler DA, Gibbs RA, et al: Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion. Proc Natl Acad Sci U S A 2011, 108:E1128-1136.

17.       Ivakhno S, Royce T, Cox AJ, Evers DJ, Cheetham RK, Tavare S: CNAseg--a novel framework for identification of copy number changes in cancer from second-generation sequencing data. Bioinformatics 2010, 26:3051-3058.

18.       Klambauer G, Schwarzbauer K, Mayr A, Clevert DA, Mitterecker A, Bodenhofer U, Hochreiter S: cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate. Nucleic Acids Res 2012, 40:e69.

19.       Magi A, Benelli M, Yoon S, Roviello F, Torricelli F: Detecting common copy number variants in high-throughput sequencing data by using JointSLM algorithm. Nucleic Acids Res 2011, 39:e65.

20.       Miller CA, Hampton O, Coarfa C, Milosavljevic A: ReadDepth: a parallel R package for detecting copy number alterations from short sequencing reads. PLoS One 2011, 6:e16327.

21.       Kim TM, Luquette LJ, Xi R, Park PJ: rSW-seq: algorithm for detection of copy number alterations in deep sequencing data. BMC Bioinformatics 2010, 11:432.

22.       Abyzov A, Urban AE, Snyder M, Gerstein M: CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res 2011, 21:974-984.

23.       Gusnanto A, Wood HM, Pawitan Y, Rabbitts P, Berri S: Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next-generation sequence data. Bioinformatics 2012, 28:40-47.

24.       Zhang Q, Ding L, Larson DE, Koboldt DC, McLellan MD, Chen K, Shi X, Kraja A, Mardis ER, Wilson RK, et al: CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data. Bioinformatics 2010, 26:464-469.

25.       Alkan C, Kidd JM, Marques-Bonet T, Aksay G, Antonacci F, Hormozdiari F, Kitzman JO, Baker C, Malig M, Mutlu O, et al: Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet 2009, 41:1061-1067.

26.       Wang Z, Hormozdiari F, Yang W-Y, Halperin E, Eskin E: CNVeM: Copy Number Variation Detection Using Uncertainty of Read Mapping. In Research in Computational Molecular Biology. Volume 7262. Edited by Chor B: Springer Berlin / Heidelberg; 2012: 326-340: Lecture Notes in Computer Science].

27.       Boeva V, Zinovyev A, Bleakley K, Vert JP, Janoueix-Lerosey I, Delattre O, Barillot E: Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization. Bioinformatics 2011, 27:268-269.

28.       Krumm N, Sudmant PH, Ko A, O'Roak BJ, Malig M, Coe BP, Quinlan AR, Nickerson DA, Eichler EE: Copy number variation detection and genotyping from exome sequence data. Genome Res 2012, 22:1525-1532.

29.       Fromer M, Moran JL, Chambert K, Banks E, Bergen SE, Ruderfer DM, Handsaker RE, McCarroll SA, O'Donovan MC, Owen MJ, et al: Discovery and Statistical Genotyping of Copy-Number Variation from Whole-Exome Sequencing Depth. Am J Hum Genet 2012, 91:597-607.

30.       Sathirapongsasuti JF, Lee H, Horst BA, Brunner G, Cochran AJ, Binder S, Quackenbush J, Nelson SF: Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV. Bioinformatics 2011, 27:2648-2654.

31.       Li J, Lupat R, Amarasinghe KC, Thompson ER, Doyle MA, Ryland GL, Tothill RW, Halgamuge SK, Campbell IG, Gorringe KL: CONTRA: copy number analysis for targeted resequencing. Bioinformatics 2012, 28:1307-1313.

32.       Ramachandran A, Micsinai M, Pe'er I: CONDEX: Copy number detection in exome sequences. In Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference on; 12-15 Nov. 2011. 2011: 87-93.

33.       Deng X: SeqGene: a comprehensive software solution for mining exome- and transcriptome- sequencing data. BMC Bioinformatics 2011, 12:267.

34.       Rigaill GJ, Cadot S, Kluin RJ, Xue Z, Bernards R, Majewski IJ, Wessels LF: A regression model for estimating DNA copy number applied to capture sequencing data. Bioinformatics 2012, 28:2357-2365.

35.       Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, Wilson RK: VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res 2012, 22:568-576.

36.       Coin LJ, Cao D, Ren J, Zuo X, Sun L, Yang S, Zhang X, Cui Y, Li Y, Jin X, Wang J: An exome sequencing pipeline for identifying and genotyping common CNVs associated with disease with application to psoriasis. Bioinformatics 2012, 28:i370-i374.

37.       Plagnol V, Curtis J, Epstein M, Mok KY, Stebbings E, Grigoriadou S, Wood NW, Hambleton S, Burns SO, Thrasher AJ, et al: A robust model for read count data in exome sequencing experiments and implications for copy number variant calling. Bioinformatics 2012, 28:2747-2754.

38.       Hajirasouliha I, Hormozdiari F, Alkan C, Kidd JM, Birol I, Eichler EE, Sahinalp SC: Detection and characterization of novel sequence insertions using paired-end next-generation sequencing. Bioinformatics 2010, 26:1277-1283.

39.       Quinlan AR, Clark RA, Sokolova S, Leibowitz ML, Zhang Y, Hurles ME, Mell JC, Hall IM: Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome. Genome Res 2010, 20:623-635.

40.       Medvedev P, Fiume M, Dzamba M, Smith T, Brudno M: Detecting copy number variation with mated short reads. Genome Res 2010, 20:1613-1622.

41.       Sindi SS, Onal S, Peng LC, Wu HT, Raphael BJ: An integrative probabilistic model for identification of structural variation in sequencing data. Genome Biol 2012, 13:R22.

42.       Handsaker RE, Korn JM, Nemesh J, McCarroll SA: Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat Genet 2011, 43:269-276.

43.       Zeitouni B, Boeva V, Janoueix-Lerosey I, Loeillet S, Legoix-ne P, Nicolas A, Delattre O, Barillot E: SVDetect: a tool to identify genomic structural variations from paired-end and mate-pair sequencing data. Bioinformatics 2010, 26:1895-1896.

44.       Qi J, Zhao F: inGAP-sv: a novel scheme to identify and visualize structural variation from paired end mapping data. Nucleic Acids Res 2011, 39:W567-575.

45.       Zhang J, Wu Y: SVseq: an approach for detecting exact breakpoints of deletions with low-coverage sequence data. Bioinformatics 2011, 27:3228-3234.

46.       Nord AS, Lee M, King MC, Walsh T: Accurate and exact CNV identification from targeted high-throughput sequence data. BMC Genomics 2011, 12:184.

All the above contents are revised from "Min Zhao, Qingguo Wang, Quan Wang, Peilin Jia, Zhongming Zhao: Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives. BMC Bioinformatics. Accepted".



 

Copyright © 2016-Present - The University of Texas Health Science Center at Houston Rights Reserved
Site Policies | State of Texas

 
  Last Modified: 2014-4-9