Dataset Information

Simple and efficient analysis of disease association with missing genotype data.

ABSTRACT: Missing genotype data arise in association studies when the single-nucleotide polymorphisms (SNPs) on the genotyping platform are not assayed successfully, when the SNPs of interest are not on the platform, or when total sequence variation is determined only on a small fraction of individuals. We present a simple and flexible likelihood framework to study SNP-disease associations with such missing genotype data. Our likelihood makes full use of all available data in case-control studies and reference panels (e.g., the HapMap), and it properly accounts for the biased nature of the case-control sampling as well as the uncertainty in inferring unknown variants. The corresponding maximum-likelihood estimators for genetic effects and gene-environment interactions are unbiased and statistically efficient. We developed fast and stable numerical algorithms to calculate the maximum-likelihood estimators and their variances, and we implemented these algorithms in a freely available computer program. Simulation studies demonstrated that the new approach is more powerful than existing methods while providing accurate control of the type I error. An application to a case-control study on rheumatoid arthritis revealed several loci that deserve further investigations.

SUBMITTER: Lin DY

PROVIDER: S-EPMC2427170 | biostudies-literature | 2008 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Simple and efficient analysis of disease association with missing genotype data.

Lin D Y DY Hu Y Y Huang B E BE

American journal of human genetics 20080201 2

Missing genotype data arise in association studies when the single-nucleotide polymorphisms (SNPs) on the genotyping platform are not assayed successfully, when the SNPs of interest are not on the platform, or when total sequence variation is determined only on a small fraction of individuals. We present a simple and flexible likelihood framework to study SNP-disease associations with such missing genotype data. Our likelihood makes full use of all available data in case-control studies and refe ...[more]

PMID: 18252224

Dataset Information

Simple and efficient analysis of disease association with missing genotype data.

Publications

Simple and efficient analysis of disease association with missing genotype data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Efficient whole-genome association mapping using local phylogenies for unphased genotype data.
| S-EPMC2553438 | biostudies-literature

Genotype Representation Graphs: Enabling Efficient Analysis of Biobank-Scale Data.
| S-EPMC11071416 | biostudies-literature

Simple descriptive missing data indicators in longitudinal studies with attrition, intermittent missing data and a high number of follow-ups.
| S-EPMC5809924 | biostudies-literature

Efficient genotype compression and analysis of large genetic-variation data sets.
| S-EPMC4697868 | biostudies-literature

The M-Value: A Simple Sensitivity Analysis for Bias Due to Missing Data in Treatment Effect Estimates.
| S-EPMC10089074 | biostudies-literature

FINEMAP-miss: fine-mapping genome-wide association studies with missing genotype information.
| S-EPMC12668598 | biostudies-literature

A simple and efficient algorithm for genome-wide homozygosity analysis in disease.
| S-EPMC2758715 | biostudies-literature

Imputation and Missing Indicators for Handling Missing Longitudinal Data: Data Simulation Analysis Based on Electronic Health Record Data.
| S-EPMC11924964 | biostudies-literature

A simple method for analyzing data from a randomized trial with a missing binary outcome.
| S-EPMC194902 | biostudies-literature

NApy: efficient statistics in Python for large-scale heterogeneous data with enhanced support for missing data.
| S-EPMC12741953 | biostudies-literature