Unknown

Dataset Information

0

Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data.


ABSTRACT: DNA sample contamination is a serious problem in DNA sequencing studies and may result in systematic genotype misclassification and false positive associations. Although methods exist to detect and filter out cross-species contamination, few methods to detect within-species sample contamination are available. In this paper, we describe methods to identify within-species DNA sample contamination based on (1) a combination of sequencing reads and array-based genotype data, (2) sequence reads alone, and (3) array-based genotype data alone. Analysis of sequencing reads allows contamination detection after sequence data is generated but prior to variant calling; analysis of array-based genotype data allows contamination detection prior to generation of costly sequence data. Through a combination of analysis of in silico and experimentally contaminated samples, we show that our methods can reliably detect and estimate levels of contamination as low as 1%. We evaluate the impact of DNA contamination on genotype accuracy and propose effective strategies to screen for and prevent DNA contamination in sequencing studies.

SUBMITTER: Jun G 

PROVIDER: S-EPMC3487130 | biostudies-literature | 2012 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data.

Jun Goo G   Flickinger Matthew M   Hetrick Kurt N KN   Romm Jane M JM   Doheny Kimberly F KF   Abecasis Gonçalo R GR   Boehnke Michael M   Kang Hyun Min HM  

American journal of human genetics 20121025 5


DNA sample contamination is a serious problem in DNA sequencing studies and may result in systematic genotype misclassification and false positive associations. Although methods exist to detect and filter out cross-species contamination, few methods to detect within-species sample contamination are available. In this paper, we describe methods to identify within-species DNA sample contamination based on (1) a combination of sequencing reads and array-based genotype data, (2) sequence reads alone  ...[more]

Similar Datasets

| S-EPMC3167057 | biostudies-literature
| S-EPMC3824133 | biostudies-literature
| S-EPMC10327099 | biostudies-literature
| S-EPMC4222080 | biostudies-literature
| S-EPMC9481068 | biostudies-literature
| S-EPMC4573246 | biostudies-literature
| S-EPMC3268604 | biostudies-literature
| S-EPMC8275324 | biostudies-literature
| S-EPMC2722999 | biostudies-literature
| S-EPMC8566571 | biostudies-literature