Unknown

Dataset Information

0

Estimation of DNA contamination and its sources in genotyped samples.


ABSTRACT: Array genotyping is a cost-effective and widely used tool that enables assessment of up to millions of genetic markers in hundreds of thousands of individuals. Genotyping array data are typically highly accurate but sensitive to mixing of DNA samples from multiple individuals before or during genotyping. Contaminated samples can lead to genotyping errors and consequently cause false positive signals or reduce power of association analyses. Here, we propose a new method to identify contaminated samples and the sources of contamination within a genotyping batch. Through analysis of array intensity and genotype data from intentionally mixed samples and 22,366 samples of the Michigan Genomics Initiative, an ongoing biobank-based study, we show that our method can reliably estimate contamination. We also show that identifying sources of contamination can implicate problematic sample processing steps and guide process improvements. Compared to existing methods, our approach can estimate the proportion of contaminating DNA more accurately, eliminate the need for external databases of allele frequencies, and provide contamination estimates that are more robust to the ancestral origin of the contaminating sample.

SUBMITTER: Zajac GJM 

PROVIDER: S-EPMC6829038 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Estimation of DNA contamination and its sources in genotyped samples.

Zajac Gregory J M GJM   Fritsche Lars G LG   Weinstock Joshua S JS   Dagenais Susan L SL   Lyons Robert H RH   Brummett Chad M CM   Abecasis Gonçalo R GR  

Genetic epidemiology 20190826 8


Array genotyping is a cost-effective and widely used tool that enables assessment of up to millions of genetic markers in hundreds of thousands of individuals. Genotyping array data are typically highly accurate but sensitive to mixing of DNA samples from multiple individuals before or during genotyping. Contaminated samples can lead to genotyping errors and consequently cause false positive signals or reduce power of association analyses. Here, we propose a new method to identify contaminated s  ...[more]

Similar Datasets

| S-EPMC3282701 | biostudies-literature
| S-EPMC4222080 | biostudies-literature
| S-EPMC7050530 | biostudies-literature
| S-EPMC4601135 | biostudies-literature
| S-EPMC7418405 | biostudies-literature
| S-EPMC7233333 | biostudies-literature
| S-EPMC4822957 | biostudies-literature
| S-EPMC3396633 | biostudies-literature
| S-EPMC5100944 | biostudies-literature
| S-EPMC3022687 | biostudies-literature