Unknown

Dataset Information

0

Who's Who? Detecting and Resolving Sample Anomalies in Human DNA Sequencing Studies with Peddy.


ABSTRACT: The potential for genetic discovery in human DNA sequencing studies is greatly diminished if DNA samples from a cohort are mislabeled, swapped, or contaminated or if they include unintended individuals. Unfortunately, the potential for such errors is significant since DNA samples are often manipulated by several protocols, labs, or scientists in the process of sequencing. We have developed a software package, peddy, to identify and facilitate the remediation of such errors via interactive visualizations and reports comparing the stated sex, relatedness, and ancestry to what is inferred from the individual genotypes derived from whole-genome (WGS) or whole-exome (WES) sequencing. Peddy predicts a sample's ancestry using a machine learning model trained on individuals of diverse ancestries from the 1000 Genomes Project reference panel. Peddy facilitates both automated and interactive, visual detection of sample swaps, poor sequencing quality, and other indicators of sample problems that, if left undetected, would inhibit discovery.

SUBMITTER: Pedersen BS 

PROVIDER: S-EPMC5339084 | biostudies-literature | 2017 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Who's Who? Detecting and Resolving Sample Anomalies in Human DNA Sequencing Studies with Peddy.

Pedersen Brent S BS   Quinlan Aaron R AR  

American journal of human genetics 20170209 3


The potential for genetic discovery in human DNA sequencing studies is greatly diminished if DNA samples from a cohort are mislabeled, swapped, or contaminated or if they include unintended individuals. Unfortunately, the potential for such errors is significant since DNA samples are often manipulated by several protocols, labs, or scientists in the process of sequencing. We have developed a software package, peddy, to identify and facilitate the remediation of such errors via interactive visual  ...[more]

Similar Datasets

| S-EPMC2917713 | biostudies-literature
| S-EPMC8164492 | biostudies-literature
| S-EPMC4593964 | biostudies-other
2011-02-12 | E-GEOD-27239 | biostudies-arrayexpress
2011-02-12 | GSE27239 | GEO
| S-EPMC3487130 | biostudies-literature
| S-EPMC6578590 | biostudies-literature
| S-EPMC8275324 | biostudies-literature
| S-EPMC8046810 | biostudies-literature
| S-EPMC4317254 | biostudies-literature