Unknown

Dataset Information

0

CNV-P: a machine-learning framework for predicting high confident copy number variations.


ABSTRACT:

Background

Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement.

Methods

Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier.

Results

The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing.

Conclusions

Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases.

SUBMITTER: Wang T 

PROVIDER: S-EPMC8645205 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

altmetric image

Publications

CNV-P: a machine-learning framework for predicting high confident copy number variations.

Wang Taifu T   Sun Jinghua J   Zhang Xiuqing X   Wang Wen-Jing WJ   Zhou Qing Q  

PeerJ 20211202


<h4>Background</h4>Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement.<h4>Methods</h4>Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive  ...[more]

Similar Datasets

| S-EPMC8375180 | biostudies-literature
| S-EPMC11437066 | biostudies-literature
| S-EPMC4546278 | biostudies-literature
| S-EPMC9713397 | biostudies-literature
| S-EPMC8215577 | biostudies-literature
2011-08-06 | GSE23576 | GEO
| S-EPMC11601614 | biostudies-literature
| S-EPMC7016774 | biostudies-literature
| S-EPMC8681111 | biostudies-literature
| S-EPMC5868770 | biostudies-literature