Dataset Information

Quality control and quality assurance in genotypic data for genome-wide association studies.

ABSTRACT: Genome-wide scans of nucleotide variation in human subjects are providing an increasing number of replicated associations with complex disease traits. Most of the variants detected have small effects and, collectively, they account for a small fraction of the total genetic variance. Very large sample sizes are required to identify and validate findings. In this situation, even small sources of systematic or random error can cause spurious results or obscure real effects. The need for careful attention to data quality has been appreciated for some time in this field, and a number of strategies for quality control and quality assurance (QC/QA) have been developed. Here we extend these methods and describe a system of QC/QA for genotypic data in genome-wide association studies (GWAS). This system includes some new approaches that (1) combine analysis of allelic probe intensities and called genotypes to distinguish gender misidentification from sex chromosome aberrations, (2) detect autosomal chromosome aberrations that may affect genotype calling accuracy, (3) infer DNA sample quality from relatedness and allelic intensities, (4) use duplicate concordance to infer SNP quality, (5) detect genotyping artifacts from dependence of Hardy-Weinberg equilibrium test P-values on allelic frequency, and (6) demonstrate sensitivity of principal components analysis to SNP selection. The methods are illustrated with examples from the "Gene Environment Association Studies" (GENEVA) program. The results suggest several recommendations for QC/QA in the design and execution of GWAS.

SUBMITTER: Laurie CC

PROVIDER: S-EPMC3061487 | biostudies-literature | 2010 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Quality control and quality assurance in genotypic data for genome-wide association studies.

Laurie Cathy C CC Doheny Kimberly F KF Mirel Daniel B DB Pugh Elizabeth W EW Bierut Laura J LJ Bhangale Tushar T Boehm Frederick F Caporaso Neil E NE Cornelis Marilyn C MC Edenberg Howard J HJ Gabriel Stacy B SB Harris Emily L EL Hu Frank B FB Jacobs Kevin B KB Kraft Peter P Landi Maria Teresa MT Lumley Thomas T Manolio Teri A TA McHugh Caitlin C Painter Ian I Paschall Justin J Rice John P JP Rice Kenneth M KM Zheng Xiuwen X Weir Bruce S BS

Genetic epidemiology 20100901 6

Genome-wide scans of nucleotide variation in human subjects are providing an increasing number of replicated associations with complex disease traits. Most of the variants detected have small effects and, collectively, they account for a small fraction of the total genetic variance. Very large sample sizes are required to identify and validate findings. In this situation, even small sources of systematic or random error can cause spurious results or obscure real effects. The need for careful att ...[more]

PMID: 20718045

Dataset Information

Quality control and quality assurance in genotypic data for genome-wide association studies.

Publications

Quality control and quality assurance in genotypic data for genome-wide association studies.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Automated quality control for genome wide association studies.
| S-EPMC5007749 | biostudies-other

Quality control procedures for genome-wide association studies.
| S-EPMC3066182 | biostudies-literature

A quality control algorithm for filtering SNPs in genome-wide association studies.
| S-EPMC2894516 | biostudies-literature

Genome-wide association studies and epigenome-wide association studies go together in cancer control.
| S-EPMC5551540 | biostudies-literature

Genome-wide epigenetic data facilitate understanding of disease susceptibility association studies.
| S-EPMC3438926 | biostudies-other

Neighbor GWAS: incorporating neighbor genotypic identity into genome-wide association studies of field herbivory.
| S-EPMC8115658 | biostudies-literature

Genome-wide association studies for bivariate sparse longitudinal data.
| S-EPMC3725885 | biostudies-literature

Genome-wide association studies of grain quality traits in maize.
| S-EPMC8105333 | biostudies-literature

A data-driven weighting scheme for family-based genome-wide association studies.
| S-EPMC2858789 | biostudies-other

Quality control and conduct of genome-wide association meta-analyses.
| S-EPMC4083217 | biostudies-literature