Unknown

Dataset Information

0

RareVar: A Framework for Detecting Low-Frequency Single-Nucleotide Variants.


ABSTRACT: Accurate identification of low-frequency somatic point mutations in tumor samples has important clinical utilities. Although high-throughput sequencing technology enables capturing such variants while sequencing primary tumor samples, our ability for accurate detection is compromised when the variant frequency is close to the sequencer error rate. Most current experimental and bioinformatic strategies target mutations with ?5% allele frequency, which limits our ability to understand the cancer etiology and tumor evolution. We present an experimental and computational modeling framework, RareVar, to reliably identify low-frequency single-nucleotide variants from high-throughput sequencing data under standard experimental protocols. RareVar protocol includes a benchmark design by pooling DNAs from already sequenced individuals at various concentrations to target variants at desired frequencies, 0.5%-3% in our case. By applying a generalized, linear model-based, position-specific error model, followed by machine-learning-based variant calibration, our approach outperforms existing methods. Our method can be applied on most capture and sequencing platforms without modifying the experimental protocol.

SUBMITTER: Hao Y 

PROVIDER: S-EPMC5510701 | biostudies-literature | 2017 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

RareVar: A Framework for Detecting Low-Frequency Single-Nucleotide Variants.

Hao Yangyang Y   Xuei Xiaoling X   Li Lang L   Nakshatri Harikrishna H   Edenberg Howard J HJ   Liu Yunlong Y  

Journal of computational biology : a journal of computational molecular cell biology 20170525 7


Accurate identification of low-frequency somatic point mutations in tumor samples has important clinical utilities. Although high-throughput sequencing technology enables capturing such variants while sequencing primary tumor samples, our ability for accurate detection is compromised when the variant frequency is close to the sequencer error rate. Most current experimental and bioinformatic strategies target mutations with ≥5% allele frequency, which limits our ability to understand the cancer e  ...[more]

Similar Datasets

| S-EPMC5001245 | biostudies-literature
| S-EPMC8144375 | biostudies-literature
2019-09-30 | GSE138130 | GEO
| S-EPMC6429602 | biostudies-literature
| S-EPMC4641353 | biostudies-literature
| S-EPMC3169821 | biostudies-other