Dataset Information

Integrative analysis of sequencing and array genotype data for discovering disease associations with rare mutations.

ABSTRACT: In the large cohorts that have been used for genome-wide association studies (GWAS), it is prohibitively expensive to sequence all cohort members. A cost-effective strategy is to sequence subjects with extreme values of quantitative traits or those with specific diseases. By imputing the sequencing data from the GWAS data for the cohort members who are not selected for sequencing, one can dramatically increase the number of subjects with information on rare variants. However, ignoring the uncertainties of imputed rare variants in downstream association analysis will inflate the type I error when sequenced subjects are not a random subset of the GWAS subjects. In this article, we provide a valid and efficient approach to combining observed and imputed data on rare variants. We consider commonly used gene-level association tests, all of which are constructed from the score statistic for assessing the effects of individual variants on the trait of interest. We show that the score statistic based on the observed genotypes for sequenced subjects and the imputed genotypes for nonsequenced subjects is unbiased. We derive a robust variance estimator that reflects the true variability of the score statistic regardless of the sampling scheme and imputation quality, such that the corresponding association tests always have correct type I error. We demonstrate through extensive simulation studies that the proposed tests are substantially more powerful than the use of accurately imputed variants only and the use of sequencing data alone. We provide an application to the Women's Health Initiative. The relevant software is freely available.

SUBMITTER: Hu YJ

PROVIDER: S-EPMC4313847 | biostudies-literature | 2015 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Integrative analysis of sequencing and array genotype data for discovering disease associations with rare mutations.

Hu Yi-Juan YJ Li Yun Y Auer Paul L PL Lin Dan-Yu DY

Proceedings of the National Academy of Sciences of the United States of America 20150112 4

In the large cohorts that have been used for genome-wide association studies (GWAS), it is prohibitively expensive to sequence all cohort members. A cost-effective strategy is to sequence subjects with extreme values of quantitative traits or those with specific diseases. By imputing the sequencing data from the GWAS data for the cohort members who are not selected for sequencing, one can dramatically increase the number of subjects with information on rare variants. However, ignoring the uncert ...[more]

PMID: 25583502

Dataset Information

Integrative analysis of sequencing and array genotype data for discovering disease associations with rare mutations.

Publications

Integrative analysis of sequencing and array genotype data for discovering disease associations with rare mutations.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Discovering disease-disease associations by fusing systems-level molecular data.
| S-EPMC3828568 | biostudies-other

Discovering disease associations by integrating electronic clinical data and medical literature.
| S-EPMC3121722 | biostudies-literature

Incremental data integration for tracking genotype-disease associations.
| S-EPMC7004389 | biostudies-literature

Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-Genome Sequencing Data.
| S-EPMC8762119 | biostudies-literature

Ontology-guided data preparation for discovering genotype-phenotype relationships.
| S-EPMC2367630 | biostudies-literature

Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data.
| S-EPMC3487130 | biostudies-literature

Meta-analysis for Discovering Rare-Variant Associations: Statistical Methods and Software Programs.
| S-EPMC4571037 | biostudies-literature

Rarity: discovering rare cell populations from single-cell imaging data.
| S-EPMC10751233 | biostudies-literature

covRNA: discovering covariate associations in large-scale gene expression data.
| S-EPMC7038619 | biostudies-literature

A general framework for detecting disease associations with rare variants in sequencing studies.
| S-EPMC3169821 | biostudies-literature