In this page, we listed computational methods pertaining to CNV detection using whole-genome and whole-exome sequencing data for users to find the right tools for calling CNVs. These CNVs can be inputted into our CNVannotator for further annotation.

As summarized in Figure 1 below, the NGS-based CNV detection methods can be categorized into five different strategies, including: (1) paired-end mapping (PEM), (2) split read (SR), (3) read depth (RD), (4) de novo assembly of a genome (AS), and (5) a combination of the above approaches (CB) .

The paired-end mapping (PEM) strategy detects CNVs through discordantly mapped reads. A discordant mapping is produced if the distance between two ends of a read pair is significantly different from the average insert size. By contrast, the split read (SR)-based methods use incompletely mapped read from each read pair to identify small CNVs. The read depth (RD)-based approach detects CNV by counting the number of reads mapped to each genomic region. In the figure, reads are mapped to three exome regions. The assembly (AS)-based approach detects CNVs by mapping contigs to the reference genome. The combinatorial approach often combine the results from the above four methods, here we make an example to combines RD and PEM information to detect CNVs. Indeed, different strategies have their own advantages and limitations. Though there has been great progress in each category, none of the methods could comprehensively detect all types of CNVs. As summarized in Tables 1-4 below, there are 6 PEM-based tools, 4 SR-based tools, 26 RD-based tools, 3 AS-based tools, and 9 tools for combinatorial approaches.

Table 1 - Summary of paired-end mapping (PEM), split read (SR), and de novo assembly (AS)-based tools for CNV detection using NGS data

Method	URL	Language	Input	Comments	Ref.
PEM-based
BreakDancer	http://breakdancer.sourceforge.net/	Perl, C++	Alignment files	Predicting insertions, deletions, inversions, inter- and intra-chromosomal translocations	[1]
PEMer	http://sv.gersteinlab.org/pemer/	Perl, Python	FASTA	Using simulation-based error models to call SVs	[2]
VariationHunter	http://compbio.cs.sfu.ca/strvar.htm	C	DIVET^a	Detecting insertions, deletions and inversions	[3]
commonLAW	http://compbio.cs.sfu.ca/strvar.htm	C++	Alignment files	Aligning multiple samples simultaneously to gain accurate SVs using maximum parsimony model	[4]
GASV	http://code.google.com/p/gasv/	Java	BAM	A geometric approach for classification and comparison of structural variants	[5]
Spanner	N/A	N/A	N/A	Using PEM to detect tandem duplications	[6]
SR-based
AGE	http://sv.gersteinlab.org/age	C++	FASTA	A dynamic-programming algorithm using optimal alignments with gap excision to detect breakpoints	[7]
Pindel	http://www.ebi.ac.uk/~kye/pindel/	C++	BAM / FASTQ	Using a pattern growth approach to identify breakpoints of various SVs	[8]
SLOPE	http://www-genepi.med.utah.edu/suppl/SLOPE	C++	SAM/ FASTQ/ MAQ^b	Locating SVs from targeted sequencing data	[9]
SRiC	N/A	N/A	BLAT output	CalibratingSV calling using realistic error models	[10]
AS-based
Magnolya	http://sourceforge.net/projects/magnolya/	Python	FASTA	Calling CNV from co-assembled genomes and estimating copy number with Poisson mixture model	[11]
Cortex assembler	http://cortexassembler.sourceforge.net/	C	FASTQ / FASTA	Using alignment of de novo assembled genome to build de Bruijn graph to detect SVs	[12]
TIGRA-SV	http://gmt.genome.wustl.edu/tigra-sv/	C	SV calls^c + BAM	Local assembly of SVs using the iterative graph routing assembly (TIGRA) algorithm	N/A

^aThe specific input format for VariationHunter, including the reads with multiple alignments.

^bFile format from MAQ mapview.

^cThe file including the detected structure variations using other tools.

Table 2 - Read depth (RD)-based tools for CNV detection using whole genome sequencing data

Tool	URL	Language	Input	Comments	Ref.
SegSeq^a	http://www.broadinstitute.org/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&paper_id=182	Matlab	Aligned read positions	Detecting CNV breakpoints using massively parallel sequence data	[13]
CNV-seq^a	http://tiger.dbs.nus.edu.sg/cnv-seq/	Perl, R	Aligned read positions	Identifying CNVs using the difference of observed copy number ratios	[14]
RDXplorer^b	http://rdxplorer.sourceforge.net/	Python, Shell	BAM	Detecting CNVs through event-wise testing algorithm on normalized read depth of coverage	[15]
BIC-seq^a	http://compbio.med.harvard.edu/Supplements/PNAS11.html	Perl, R	BAM	Using the Bayesian information criterion to detect CNVs based on uniquely mapped reads	[16]
CNAseg^a	http://www.compbio.group.cam.ac.uk/software/cnaseg	R	BAM	Using flowcell-to-flowcell variability in cancer and control samples to reduce false positives	[17]
cn.MOPS^b	http://www.bioinf.jku.at/software/cnmops/	R	BAM/read count matrices	Modelling of read depths across samples at each genomic position using mixture Poisson model	[18]
JointSLM^b	http://nar.oxfordjournals.org/content/suppl/2011/02/16/ gkr068.DC1/JointSLM_R_Package.zip	R	SAM/BAM	Population-based approach to detect common CNVs using read depth data	[19]
ReadDepth	http://code.google.com/p/readdepth/	R	BED files	Using breakpoints to increase the resolution of CNV detection from low-coverage reads	[20]
rSW-seq^a	http://compbio.med.harvard.edu/Supplements/BMCBioinfo10-2.html	C	Aligned read positions	Identifying CNVs by comparing matched tumor and control sample	[21]
CNVnator	http://sv.gersteinlab.org/	C++	BAM	Using mean-shift approach and performing multiple-bandwidth partitioning and GC correction	[22]
CNVnorm^a	http://www.precancer.leeds.ac.uk/cnanorm	R	Aligned read positions	Identifying contamination level with normal cells	[23]
CMDS^b	https://dsgweb.wustl.edu/qunyuan/software/cmds	C, R	Aligned read positions	Discovering CNVs from multiple samples	[24]
mrCaNaVar	http://mrcanavar.sourceforge.net/	C	SAM	A tool to detect large segmental duplications and insertions	[25]
CNVeM	N/A	N/A	N/A	Predicting CNV breakpoints in base-pair resolution	[26]
cnvHMM	http://genome.wustl.edu/software/cnvhmm	C	consensus sequence from SAMtools	Using HMM to detect CNV	N/A

^aTools require matched case-control sample as input.

^bTools use multiple samples as input.

Table 3 - Summary of bioinformatics tools for CNV detection using exome sequencing data

Tool	URL	Language	Input	Comments	Ref.
Control-FREEC^a	http://bioinfo-out.curie.fr/projects/freec/	C++	SAM/BAM/pileup/ Eland, BED, SOAP, arachne, psi (BLAT) and Bowtie formats	Correcting copy number using matched case-control samples or GC contents	[27]
CoNIFER^b	http://conifer.sf.net/	Python	BAM	Using singular value decomposition to normalize copy number and avoiding batch bias by integrating multiple samples	[28]
XHMM^b	http://atgu.mgh.harvard.edu/xhmm/	C++	BAM	Uses principal component analysis to normalize copy number and HMM to detect CNVs	[29]
ExomeCNV^c	http://cran.r-project.org/web/packages/ExomeCNV	R	BAM/pileup	Using read depth and B-allele frequencies from exome sequencing data to detect CNVs and LOHs	[30]
CONTRA^c	http://contra-cnv.sourceforge.net/	Python	SAM/BAM	Comparing base-level log-ratios calculated from read depth between case and control samples	[31]
CONDEX	http://code.google.com/p/condr/	Java	Sorted BED files	Using HMM to identify CNVs	[32]
SeqGene	http://seqgene.sourceforge.net	Python, R	SAM/pileup	Calling variants, including CNVs, from exome sequencing data	[33]
PropSeq^c	http://bioinformatics.nki.nl/ocs/	R, C	N/A	Using the read depth of the case sample as a linear function of that of control sample to detect CNVs	[34]
VarScan2^c	http://genome.wustl.edu/software/varscan	Java	BAM/pileup	Using pairwise comparisons of the normalized read depth at each position to estimate CNV	[35]
ExoCNVTest^b	http://www1.imperial.ac.uk/medicine/people/l.coin/	Java, R	BAM	Identifying and genotyping common CNVs associated with complex disease	[36]
ExomeDepth^b	http://cran.r-project.org/web/packages/ExomeDepth/index.html	R	BAM	Using beta-binomial model to fit read depth of WES data	[37]

^aControl-FREEC accepts either matched case-control samples or single sample as input.

^bTools use multiple samples as input.

^cTools require matched case-control samples as input.

Table 4 - Combinatorial bioinformatics tools for CNV detection using NGS data

Method	URL		Language	Input	Combination^a	Ref.
NovelSeq		http://compbio.cs.sfu.ca/strvar.htm	C	FASTA/SAM	PEM+AS	[38]
HYDRA		http://code.google.com/p/hydra-sv/	Python	discordant paired-end mappings	PEM+AS	[39]
CNVer		http://compbio.cs.toronto.edu/CNVer/	Perl, C++	BAM/ aligned positions	PEM+RD	[40]
GASVPro		http://code.google.com/p/gasv/	C++	BAM	PEM+RD	[41]
Genome STRiP		http://www.broadinstitute.org/software/ genomestrip/genome-strip	Java, R	BAM	PEM+RD	[42]
SVDetect		http://svdetect.sourceforge.net/	Perl	SAM/BAM/ ELAND	PEM+RD	[43]
inGAP-sv		http://ingap.sourceforge.net/	Java	SAM	PEM+RD	[44]
SVseq		http://www.engr.uconn.edu/~jiz08001/svseq.html	C	FASTQ / BAM	PEM+SR	[45]
Nord et al.		N/A	N/A	N/A	RD+SR	[46]

^aRD: read depth-based approach; PEM: paired-end mapping approach; SR: split read approach; AS: de novo assembly approach.

Reference

1. Chen K, Wallis JW, McLellan MD, Larson DE, Kalicki JM, Pohl CS, McGrath SD, Wendl MC, Zhang QY, Locke DP, et al: BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods 2009, 6:677- 681.

2. Korbel JO, Abyzov A, Mu XJ, Carriero N, Cayting P, Zhang ZD, Snyder M, Gerstein MB: PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biol 2009, 10:R23.

3. Hormozdiari F, Hajirasouliha I, Dao P, Hach F, Yorukoglu D, Alkan C, Eichler EE, Sahinalp SC: Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery. Bioinformatics 2010, 26:i350-357.

4. Hormozdiari F, Hajirasouliha I, McPherson A, Eichler EE, Sahinalp SC: Simultaneous structural variation discovery among multiple paired-end sequenced genomes. Genome Res 2011, 21:2203-2212.

5. Sindi S, Helman E, Bashir A, Raphael BJ: A geometric approach for classification and comparison of structural variants. Bioinformatics 2009, 25:i222-230.

6. Mills RE, Walter K, Stewart C, Handsaker RE, Chen K, Alkan C, Abyzov A, Yoon SC, Ye K, Cheetham RK, et al: Mapping copy number variation by population-scale genome sequencing. Nature 2011, 470:59-65.

7. Abyzov A, Gerstein M: AGE: defining breakpoints of genomic structural variants at single-nucleotide resolution, through optimal alignments with gap excision. Bioinformatics 2011, 27:595-603.

8. Ye K, Schulz MH, Long Q, Apweiler R, Ning Z: Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 2009, 25:2865-2871.

9. Abel HJ, Duncavage EJ, Becker N, Armstrong JR, Magrini VJ, Pfeifer JD: SLOPE: a quick and accurate method for locating non-SNP structural variation from targeted next-generation sequence data. Bioinformatics 2010, 26:2684-2688.

10. Zhang ZD, Du J, Lam H, Abyzov A, Urban AE, Snyder M, Gerstein M: Identification of genomic indels and structural variations using split reads. BMC Genomics 2011, 12:375.

11. Nijkamp JF, van den Broek MA, Geertman JM, Reinders MJ, Daran JM, de Ridder D: De novo detection of copy number variation by co-assembly. Bioinformatics 2012.

12. Iqbal Z, Caccamo M, Turner I, Flicek P, McVean G: De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet 2012, 44:226-232.

13. Chiang DY, Getz G, Jaffe DB, O'Kelly MJ, Zhao X, Carter SL, Russ C, Nusbaum C, Meyerson M, Lander ES: High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat Methods 2009, 6:99-103.

14. Xie C, Tammi MT: CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics 2009, 10:80.

15. Yoon S, Xuan Z, Makarov V, Ye K, Sebat J: Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res 2009, 19:1586-1592.

16. Xi R, Hadjipanayis AG, Luquette LJ, Kim TM, Lee E, Zhang J, Johnson MD, Muzny DM, Wheeler DA, Gibbs RA, et al: Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion. Proc Natl Acad Sci U S A 2011, 108:E1128-1136.

17. Ivakhno S, Royce T, Cox AJ, Evers DJ, Cheetham RK, Tavare S: CNAseg--a novel framework for identification of copy number changes in cancer from second-generation sequencing data. Bioinformatics 2010, 26:3051-3058.

18. Klambauer G, Schwarzbauer K, Mayr A, Clevert DA, Mitterecker A, Bodenhofer U, Hochreiter S: cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate. Nucleic Acids Res 2012, 40:e69.

19. Magi A, Benelli M, Yoon S, Roviello F, Torricelli F: Detecting common copy number variants in high-throughput sequencing data by using JointSLM algorithm. Nucleic Acids Res 2011, 39:e65.

20. Miller CA, Hampton O, Coarfa C, Milosavljevic A: ReadDepth: a parallel R package for detecting copy number alterations from short sequencing reads. PLoS One 2011, 6:e16327.

21. Kim TM, Luquette LJ, Xi R, Park PJ: rSW-seq: algorithm for detection of copy number alterations in deep sequencing data. BMC Bioinformatics 2010, 11:432.

22. Abyzov A, Urban AE, Snyder M, Gerstein M: CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res 2011, 21:974-984.

23. Gusnanto A, Wood HM, Pawitan Y, Rabbitts P, Berri S: Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next-generation sequence data. Bioinformatics 2012, 28:40-47.

24. Zhang Q, Ding L, Larson DE, Koboldt DC, McLellan MD, Chen K, Shi X, Kraja A, Mardis ER, Wilson RK, et al: CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data. Bioinformatics 2010, 26:464-469.

25. Alkan C, Kidd JM, Marques-Bonet T, Aksay G, Antonacci F, Hormozdiari F, Kitzman JO, Baker C, Malig M, Mutlu O, et al: Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet 2009, 41:1061-1067.

26. Wang Z, Hormozdiari F, Yang W-Y, Halperin E, Eskin E: CNVeM: Copy Number Variation Detection Using Uncertainty of Read Mapping. In Research in Computational Molecular Biology. Volume 7262. Edited by Chor B: Springer Berlin / Heidelberg; 2012: 326-340: Lecture Notes in Computer Science].

27. Boeva V, Zinovyev A, Bleakley K, Vert JP, Janoueix-Lerosey I, Delattre O, Barillot E: Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization. Bioinformatics 2011, 27:268-269.

28. Krumm N, Sudmant PH, Ko A, O'Roak BJ, Malig M, Coe BP, Quinlan AR, Nickerson DA, Eichler EE: Copy number variation detection and genotyping from exome sequence data. Genome Res 2012, 22:1525-1532.

29. Fromer M, Moran JL, Chambert K, Banks E, Bergen SE, Ruderfer DM, Handsaker RE, McCarroll SA, O'Donovan MC, Owen MJ, et al: Discovery and Statistical Genotyping of Copy-Number Variation from Whole-Exome Sequencing Depth. Am J Hum Genet 2012, 91:597-607.

30. Sathirapongsasuti JF, Lee H, Horst BA, Brunner G, Cochran AJ, Binder S, Quackenbush J, Nelson SF: Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV. Bioinformatics 2011, 27:2648-2654.

31. Li J, Lupat R, Amarasinghe KC, Thompson ER, Doyle MA, Ryland GL, Tothill RW, Halgamuge SK, Campbell IG, Gorringe KL: CONTRA: copy number analysis for targeted resequencing. Bioinformatics 2012, 28:1307-1313.

32. Ramachandran A, Micsinai M, Pe'er I: CONDEX: Copy number detection in exome sequences. In Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference on; 12-15 Nov. 2011. 2011: 87-93.

33. Deng X: SeqGene: a comprehensive software solution for mining exome- and transcriptome- sequencing data. BMC Bioinformatics 2011, 12:267.

34. Rigaill GJ, Cadot S, Kluin RJ, Xue Z, Bernards R, Majewski IJ, Wessels LF: A regression model for estimating DNA copy number applied to capture sequencing data. Bioinformatics 2012, 28:2357-2365.

35. Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, Wilson RK: VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res 2012, 22:568-576.

36. Coin LJ, Cao D, Ren J, Zuo X, Sun L, Yang S, Zhang X, Cui Y, Li Y, Jin X, Wang J: An exome sequencing pipeline for identifying and genotyping common CNVs associated with disease with application to psoriasis. Bioinformatics 2012, 28:i370-i374.

37. Plagnol V, Curtis J, Epstein M, Mok KY, Stebbings E, Grigoriadou S, Wood NW, Hambleton S, Burns SO, Thrasher AJ, et al: A robust model for read count data in exome sequencing experiments and implications for copy number variant calling. Bioinformatics 2012, 28:2747-2754.

38. Hajirasouliha I, Hormozdiari F, Alkan C, Kidd JM, Birol I, Eichler EE, Sahinalp SC: Detection and characterization of novel sequence insertions using paired-end next-generation sequencing. Bioinformatics 2010, 26:1277-1283.

39. Quinlan AR, Clark RA, Sokolova S, Leibowitz ML, Zhang Y, Hurles ME, Mell JC, Hall IM: Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome. Genome Res 2010, 20:623-635.

40. Medvedev P, Fiume M, Dzamba M, Smith T, Brudno M: Detecting copy number variation with mated short reads. Genome Res 2010, 20:1613-1622.

41. Sindi SS, Onal S, Peng LC, Wu HT, Raphael BJ: An integrative probabilistic model for identification of structural variation in sequencing data. Genome Biol 2012, 13:R22.

42. Handsaker RE, Korn JM, Nemesh J, McCarroll SA: Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat Genet 2011, 43:269-276.

43. Zeitouni B, Boeva V, Janoueix-Lerosey I, Loeillet S, Legoix-ne P, Nicolas A, Delattre O, Barillot E: SVDetect: a tool to identify genomic structural variations from paired-end and mate-pair sequencing data. Bioinformatics 2010, 26:1895-1896.

44. Qi J, Zhao F: inGAP-sv: a novel scheme to identify and visualize structural variation from paired end mapping data. Nucleic Acids Res 2011, 39:W567-575.

45. Zhang J, Wu Y: SVseq: an approach for detecting exact breakpoints of deletions with low-coverage sequence data. Bioinformatics 2011, 27:3228-3234.

46. Nord AS, Lee M, King MC, Walsh T: Accurate and exact CNV identification from targeted high-throughput sequence data. BMC Genomics 2011, 12:184.

All the above contents are revised from "Min Zhao, Qingguo Wang, Quan Wang, Peilin Jia, Zhongming Zhao: Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives. BMC Bioinformatics. Accepted".

	Copyright © 2016-Present - The University of Texas Health Science Center at Houston Rights Reserved Site Policies \| State of Texas

	Last Modified: 2014-4-9