Unknown

Dataset Information

0

Correcting for Sample Contamination in Genotype Calling of DNA Sequence Data.


ABSTRACT: DNA sample contamination is a frequent problem in DNA sequencing studies and can result in genotyping errors and reduced power for association testing. We recently described methods to identify within-species DNA sample contamination based on sequencing read data, showed that our methods can reliably detect and estimate contamination levels as low as 1%, and suggested strategies to identify and remove contaminated samples from sequencing studies. Here we propose methods to model contamination during genotype calling as an alternative to removal of contaminated samples from further analyses. We compare our contamination-adjusted calls to calls that ignore contamination and to calls based on uncontaminated data. We demonstrate that, for moderate contamination levels (5%-20%), contamination-adjusted calls eliminate 48%-77% of the genotyping errors. For lower levels of contamination, our contamination correction methods produce genotypes nearly as accurate as those based on uncontaminated data. Our contamination correction methods are useful generally, but are particularly helpful for sample contamination levels from 2% to 20%.

SUBMITTER: Flickinger M 

PROVIDER: S-EPMC4573246 | biostudies-literature | 2015 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Correcting for Sample Contamination in Genotype Calling of DNA Sequence Data.

Flickinger Matthew M   Jun Goo G   Abecasis Gonçalo R GR   Boehnke Michael M   Kang Hyun Min HM  

American journal of human genetics 20150730 2


DNA sample contamination is a frequent problem in DNA sequencing studies and can result in genotyping errors and reduced power for association testing. We recently described methods to identify within-species DNA sample contamination based on sequencing read data, showed that our methods can reliably detect and estimate contamination levels as low as 1%, and suggested strategies to identify and remove contaminated samples from sequencing studies. Here we propose methods to model contamination du  ...[more]

Similar Datasets

| S-EPMC3404070 | biostudies-literature
| S-EPMC7050530 | biostudies-literature
| S-EPMC11348166 | biostudies-literature
| S-EPMC5427492 | biostudies-literature
| S-EPMC2639077 | biostudies-literature
| S-EPMC4601135 | biostudies-literature
| S-EPMC4498232 | biostudies-literature
| S-EPMC3487130 | biostudies-literature
| S-EPMC3338331 | biostudies-literature
| S-EPMC1280925 | biostudies-literature