Dataset Information

Estimating genotype error rates from high-coverage next-generation sequence data.

ABSTRACT: Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to estimate lower bounds on the error rates of Complete Genomics and Illumina HiSeq whole-genome and whole-exome sequencing. Error rates for nonreference genotype calls range from 0.1% to 0.6%, depending on the platform and the depth of coverage. Additionally, we found (1) no difference in the error profiles or rates between blood and saliva samples; (2) Complete Genomics sequences had substantially higher error rates than Illumina sequences had; (3) error rates were higher (up to 6%) for rare or unique variants; (4) error rates generally declined with genotype quality (GQ) score, but in a nonlinear fashion for the Illumina data, likely due to loss of specificity of GQ scores greater than 60; and (5) error rates increased with increasing depth of coverage for the Illumina data. These findings, especially (3)-(5), suggest that caution should be taken in interpreting the results of next-generation sequencing-based association studies, and even more so in clinical application of this technology in the absence of validation by other more robust sequencing or genotyping methods.

SUBMITTER: Wall JD

PROVIDER: S-EPMC4216915 | biostudies-literature | 2014 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Estimating genotype error rates from high-coverage next-generation sequence data.

Wall Jeffrey D JD Tang Ling Fung LF Zerbe Brandon B Kvale Mark N MN Kwok Pui-Yan PY Schaefer Catherine C Risch Neil N

Genome research 20141010 11

Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to estimate lower bounds on the error rates of Complete Genomics and Illumina HiSeq whole-genome and whole-exome sequencing. Error rates for nonreference genotype calls range from 0.1% to 0.6%, depending on ...[more]

PMID: 25304867

Dataset Information

Estimating genotype error rates from high-coverage next-generation sequence data.

Publications

Estimating genotype error rates from high-coverage next-generation sequence data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Detecting identity by descent and estimating genotype error rates in sequence data.
| S-EPMC3824133 | biostudies-literature

Analysis of error profiles in deep next-generation sequencing data.
| S-EPMC6417284 | biostudies-literature

Estimating individual admixture proportions from next generation sequencing data.
| S-EPMC3813857 | biostudies-literature

Estimating time of HIV-1 infection from next-generation sequence diversity.
| S-EPMC5638550 | biostudies-literature

Benchmarking of computational error-correction methods for next-generation sequencing data.
| S-EPMC7079412 | biostudies-literature

Systematic evaluation of error rates and causes in short samples in next-generation sequencing.
| S-EPMC6053417 | biostudies-literature

HIVE-hexagon: high-performance, parallelized sequence alignment for next-generation sequencing data analysis.
| S-EPMC4053384 | biostudies-literature

PurBayes: estimating tumor cellularity and subclonality in next-generation sequencing data.
| S-EPMC3712213 | biostudies-literature

Light-RCV: a lightweight read coverage viewer for next generation sequencing data.
| S-EPMC4682413 | biostudies-literature

Ultrasensitive amplicon barcoding for next-generation sequencing facilitating sequence error and amplification-bias correction.
| S-EPMC7324614 | biostudies-literature