Unknown

Dataset Information

0

CHOPER filters enable rare mutation detection in complex mutagenesis populations by next-generation sequencing.


ABSTRACT: Next-generation sequencing (NGS) has revolutionized genetics and enabled the accurate identification of many genetic variants across many genomes. However, detection of biologically important low-frequency variants within genetically heterogeneous populations remains challenging, because they are difficult to distinguish from intrinsic NGS sequencing error rates. Approaches to overcome these limitations are essential to detect rare mutations in large cohorts, virus or microbial populations, mitochondria heteroplasmy, and other heterogeneous mixtures such as tumors. Modifications in library preparation can overcome some of these limitations, but are experimentally challenging and restricted to skilled biologists. This paper describes a novel quality filtering and base pruning pipeline, called Complex Heterogeneous Overlapped Paired-End Reads (CHOPER), designed to detect sequence variants in a complex population with high sequence similarity derived from All-Codon-Scanning (ACS) mutagenesis. A novel fast alignment algorithm, designed for the specified application, has O(n) time complexity. CHOPER was applied to a p53 cancer mutant reactivation study derived from ACS mutagenesis. Relative to error filtering based on Phred quality scores, CHOPER improved accuracy by about 13% while discarding only half as many bases. These results are a step toward extending the power of NGS to the analysis of genetically heterogeneous populations.

SUBMITTER: Salehi F 

PROVIDER: S-EPMC4333345 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

altmetric image

Publications

CHOPER filters enable rare mutation detection in complex mutagenesis populations by next-generation sequencing.

Salehi Faezeh F   Baronio Roberta R   Idrogo-Lam Ryan R   Vu Huy H   Hall Linda V LV   Kaiser Peter P   Lathrop Richard H RH  

PloS one 20150218 2


Next-generation sequencing (NGS) has revolutionized genetics and enabled the accurate identification of many genetic variants across many genomes. However, detection of biologically important low-frequency variants within genetically heterogeneous populations remains challenging, because they are difficult to distinguish from intrinsic NGS sequencing error rates. Approaches to overcome these limitations are essential to detect rare mutations in large cohorts, virus or microbial populations, mito  ...[more]

Similar Datasets

| S-EPMC3437896 | biostudies-other
| S-EPMC4920415 | biostudies-literature
| S-EPMC6395625 | biostudies-literature
| S-EPMC5244592 | biostudies-literature
| S-EPMC4810260 | biostudies-literature
| S-EPMC7890800 | biostudies-literature
| S-EPMC3984111 | biostudies-literature
| S-EPMC4618462 | biostudies-literature
| S-EPMC4332779 | biostudies-literature
2017-04-03 | PXD003804 | Pride