Dataset Information

A robust clustering algorithm for identifying problematic samples in genome-wide association studies.

ABSTRACT:

Summary

High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections.

Availability

The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer

Contact

chris.spencer@well.ox.ac.uk

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Bellenguez C

PROVIDER: S-EPMC3244763 | biostudies-literature | 2012 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A robust clustering algorithm for identifying problematic samples in genome-wide association studies.

Bellenguez Céline C Strange Amy A Freeman Colin C Donnelly Peter P Spencer Chris C A CC

Bioinformatics (Oxford, England) 20111103 1

<h4>Summary</h4>High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's in ...[more]

PMID: 22057162

Dataset Information

A robust clustering algorithm for identifying problematic samples in genome-wide association studies.

Summary

Availability

Contact

Supplementary information

Publications

A robust clustering algorithm for identifying problematic samples in genome-wide association studies.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies.
| S-EPMC6757104 | biostudies-literature

Robust relationship inference in genome-wide association studies.
| S-EPMC3025716 | biostudies-literature

Robust Reference Powered Association Test of Genome-Wide Association Studies.
| S-EPMC6465778 | biostudies-literature

Identifying disease associations via genome-wide association studies.
| S-EPMC2648782 | biostudies-literature

A robust method for testing association in genome-wide association studies.
| S-EPMC3322627 | biostudies-literature

Robust Association Tests for the Replication of Genome-Wide Association Studies.
| S-EPMC4539975 | biostudies-literature

Robust Gene-Gene Interaction Analysis in Genome Wide Association Studies.
| S-EPMC4534386 | biostudies-literature

Robust methods for population stratification in genome wide association studies.
| S-EPMC3637636 | biostudies-literature

Genetic clustering on the hippocampal surface for genome-wide association studies.
| S-EPMC4024454 | biostudies-literature

A hierarchical and modular approach to the discovery of robust associations in genome-wide association studies from pooled DNA samples.
| S-EPMC2248205 | biostudies-literature