Unknown

Dataset Information

0

SMaSH: Sample matching using SNPs in humans.


ABSTRACT: BACKGROUND:Inadvertent sample swaps are a real threat to data quality in any medium to large scale omics studies. While matches between samples from the same individual can in principle be identified from a few well characterized single nucleotide polymorphisms (SNPs), omics data types often only provide low to moderate coverage, thus requiring integration of evidence from a large number of SNPs to determine if two samples derive from the same individual or not. METHODS:We select about six thousand SNPs in the human genome and develop a Bayesian framework that is able to robustly identify sample matches between next generation sequencing data sets. RESULTS:We validate our approach on a variety of data sets. Most importantly, we show that our approach can establish identity between different omics data types such as Exome, RNA-Seq, and MethylCap-Seq. We demonstrate how identity detection degrades with sample quality and read coverage, but show that twenty million reads of a fairly low quality RNA-Seq sample are still sufficient for reliable sample identification. CONCLUSION:Our tool, SMASH, is able to identify sample mismatches in next generation sequencing data sets between different sequencing modalities and for low quality sequencing data.

SUBMITTER: Westphal M 

PROVIDER: S-EPMC6936078 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

SMaSH: Sample matching using SNPs in humans.

Westphal Maximillian M   Frankhouser David D   Sonzone Carmine C   Shields Peter G PG   Yan Pearlly P   Bundschuh Ralf R  

BMC genomics 20191230 Suppl 12


<h4>Background</h4>Inadvertent sample swaps are a real threat to data quality in any medium to large scale omics studies. While matches between samples from the same individual can in principle be identified from a few well characterized single nucleotide polymorphisms (SNPs), omics data types often only provide low to moderate coverage, thus requiring integration of evidence from a large number of SNPs to determine if two samples derive from the same individual or not.<h4>Methods</h4>We select  ...[more]

Similar Datasets

| S-EPMC5013917 | biostudies-literature
| S-EPMC4417508 | biostudies-literature
2006-11-21 | GSE6306 | GEO
| S-EPMC8329933 | biostudies-literature
| S-EPMC6412123 | biostudies-literature
2010-10-08 | E-GEOD-6306 | biostudies-arrayexpress
| S-EPMC3156813 | biostudies-literature
| S-EPMC9038198 | biostudies-literature
| S-EPMC4173010 | biostudies-literature