Unknown

Dataset Information

0

Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set.


ABSTRACT: Although the technical and analytic complexity of whole genome sequencing is generally appreciated, best practices for data cleaning and quality control have not been defined. Family based data can be used to guide the standardization of specific quality control metrics in nonfamily based data. Given the low mutation rate, Mendelian inheritance errors are likely as a result of erroneous genotype calls. Thus, our goal was to identify the characteristics that determine Mendelian inheritance errors. To accomplish this, we used chromosome 3 whole genome sequencing family based data from the Genetic Analysis Workshop 18. Mendelian inheritance errors were provided as part of the GAW18 data set. Additionally, for binary variants we calculated Mendelian inheritance errors using PLINK. Based on our analysis, nonbinary single-nucleotide variants have an inherently high number of Mendelian inheritance errors. Furthermore, in binary variants, Mendelian inheritance errors are not randomly distributed. Indeed, we identified 3 Mendelian inheritance error peaks that were enriched with repetitive elements. However, these peaks can be lessened with the inclusion of a single filter from the sequencing file. In summary, we demonstrated that erroneous sequencing calls are nonrandomly distributed across the genome and quality control metrics can dramatically reduce the number of mendelian inheritance errors. Appropriate quality control will allow optimal use of genetic data to realize the full potential of whole genome sequencing.

SUBMITTER: Pilipenko VV 

PROVIDER: S-EPMC4144465 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set.

Pilipenko Valentina V VV   He Hua H   Kurowski Brad G BG   Alexander Eileen S ES   Zhang Xue X   Ding Lili L   Mersha Tesfaye B TB   Kottyan Leah L   Fardo David W DW   Martin Lisa J LJ  

BMC proceedings 20140617 Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo


Although the technical and analytic complexity of whole genome sequencing is generally appreciated, best practices for data cleaning and quality control have not been defined. Family based data can be used to guide the standardization of specific quality control metrics in nonfamily based data. Given the low mutation rate, Mendelian inheritance errors are likely as a result of erroneous genotype calls. Thus, our goal was to identify the characteristics that determine Mendelian inheritance errors  ...[more]

Similar Datasets

| S-EPMC4240813 | biostudies-literature
| S-EPMC4211433 | biostudies-other
| S-EPMC5702572 | biostudies-literature
| S-EPMC5737511 | biostudies-literature
| S-EPMC5829578 | biostudies-literature
| S-EPMC6834861 | biostudies-literature
| S-EPMC7587219 | biostudies-literature
| S-EPMC10963753 | biostudies-literature
| S-EPMC5905984 | biostudies-literature
| S-EPMC4171678 | biostudies-literature