Unknown

Dataset Information

0

Reducing false-positive incidental findings with ensemble genotyping and logistic regression based variant filtering methods.


ABSTRACT: As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false-positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here, we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false-negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous single nucleotide variants (SNVs); 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery in NA12878, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and an ensemble genotyping would be essential to minimize false-positive DNM candidates.

SUBMITTER: Hwang KB 

PROVIDER: S-EPMC4112476 | biostudies-literature | 2014 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Reducing false-positive incidental findings with ensemble genotyping and logistic regression based variant filtering methods.

Hwang Kyu-Baek KB   Lee In-Hee IH   Park Jin-Ho JH   Hambuch Tina T   Choe Yongjoon Y   Kim MinHyeok M   Lee Kyungjoon K   Song Taemin T   Neu Matthew B MB   Gupta Neha N   Kohane Isaac S IS   Green Robert C RC   Kong Sek Won SW  

Human mutation 20140624 8


As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false-positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here, we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired  ...[more]

Similar Datasets

| S-EPMC4352885 | biostudies-literature
| S-EPMC3817965 | biostudies-literature
| S-EPMC5543767 | biostudies-other
2020-12-21 | PXD021677 | Pride
| S-EPMC6075720 | biostudies-literature
| S-EPMC7092376 | biostudies-literature
| S-EPMC5780421 | biostudies-literature
| S-EPMC2633005 | biostudies-literature
| S-EPMC8596493 | biostudies-literature
| S-EPMC7556384 | biostudies-literature