Dataset Information

A machine-learning approach for accurate detection of copy number variants from exome sequencing.

ABSTRACT: Copy number variants (CNVs) are a major cause of several genetic disorders, making their detection an essential component of genetic analysis pipelines. Current methods for detecting CNVs from exome-sequencing data are limited by high false-positive rates and low concordance because of inherent biases of individual algorithms. To overcome these issues, calls generated by two or more algorithms are often intersected using Venn diagram approaches to identify "high-confidence" CNVs. However, this approach is inadequate, because it misses potentially true calls that do not have consensus from multiple callers. Here, we present CN-Learn, a machine-learning framework that integrates calls from multiple CNV detection algorithms and learns to accurately identify true CNVs using caller-specific and genomic features from a small subset of validated CNVs. Using CNVs predicted by four exome-based CNV callers (CANOES, CODEX, XHMM, and CLAMMS) from 503 samples, we demonstrate that CN-Learn identifies true CNVs at higher precision (∼90%) and recall (∼85%) rates while maintaining robust performance even when trained with minimal data (∼30 samples). CN-Learn recovers twice as many CNVs compared to individual callers or Venn diagram-based approaches, with features such as exome capture probe count, caller concordance, and GC content providing the most discriminatory power. In fact, ∼58% of all true CNVs recovered by CN-Learn were either singletons or calls that lacked support from at least one caller. Our study underscores the limitations of current approaches for CNV identification and provides an effective method that yields high-quality CNVs for application in clinical diagnostics.

SUBMITTER: Pounraja VK

PROVIDER: S-EPMC6633262 | biostudies-literature | 2019 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A machine-learning approach for accurate detection of copy number variants from exome sequencing.

Pounraja Vijay Kumar VK Jayakar Gopal G Jensen Matthew M Kelkar Neil N Girirajan Santhosh S

Genome research 20190606 7

Copy number variants (CNVs) are a major cause of several genetic disorders, making their detection an essential component of genetic analysis pipelines. Current methods for detecting CNVs from exome-sequencing data are limited by high false-positive rates and low concordance because of inherent biases of individual algorithms. To overcome these issues, calls generated by two or more algorithms are often intersected using Venn diagram approaches to identify "high-confidence" CNVs. However, this a ...[more]

PMID: 31171634

Dataset Information

A machine-learning approach for accurate detection of copy number variants from exome sequencing.

Publications

A machine-learning approach for accurate detection of copy number variants from exome sequencing.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

ECOLE: Learning to call copy number variants on whole exome sequencing data.
| S-EPMC10762021 | biostudies-literature

Enhanced copy number variants detection from whole-exome sequencing data using EXCAVATOR2.
| S-EPMC5175347 | biostudies-literature

EXCAVATOR: detecting copy number variants from whole-exome sequencing data.
| S-EPMC4053953 | biostudies-literature

CANOES: detecting rare copy number variants from whole exome sequencing data.
| S-EPMC4081054 | biostudies-literature

VEGAWES: variational segmentation on whole exome sequencing for copy number detection.
| S-EPMC4587906 | biostudies-literature

Detection of clinically relevant copy-number variants by exome sequencing in a large cohort of genetic disorders.
| S-EPMC5460076 | biostudies-literature

RefCNV: Identification of Gene-Based Copy Number Variants Using Whole Exome Sequencing.
| S-EPMC4849420 | biostudies-literature

Polishing copy number variant calls on exome sequencing data via deep learning.
| S-EPMC9248885 | biostudies-literature

Comparative study of whole exome sequencing-based copy number variation detection tools.
| S-EPMC7059689 | biostudies-literature

Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV.
| S-EPMC3179661 | biostudies-literature