Unknown

Dataset Information

0

Prioritize and select SNPs for association studies with multi-stage designs.


ABSTRACT: Large-scale whole genome association studies are increasingly common, due in large part to recent advances in genotyping technology. With this change in paradigm for genetic studies of complex diseases, it is vital to develop valid, powerful, and efficient statistical tools and approaches to evaluate such data. Despite a dramatic drop in genotyping costs, it is still expensive to genotype thousands of individuals for hundreds of thousands single nucleotide polymorphisms (SNPs) for large-scale whole genome association studies. A multi-stage (or two-stage) design has been a promising alternative: in the first stage, only a fraction of samples are genotyped and tested using a dense set of SNPs, and only a small subset of markers that show moderate associations with the disease will be genotyped in later stages. Multi-stage designs have also been used in candidate gene association studies, usually in regions that have shown strong signals by linkage studies. To decide which set of SNPs to be genotyped in the next stage, a common practice is to utilize a simple test (such as a chi2 test for case-control data) and a liberal significance level without corrections for multiple testing, to ensure that no true signals will be filtered out. In this paper, I have developed a novel SNP selection procedure within the framework of multi-stage designs. Based on data from stage 1, the method explicitly explores correlations (linkage disequilibrium) among SNPs and their possible interactions in determining the disease phenotype. Comparing with a regular multi-stage design, the approach can select a much reduced set of SNPs with high discriminative power for later stages. Therefore, not only does it reduce the genotyping cost in later stages, it also increases the statistical power by reducing the number of tests. Combined analysis is proposed to further improve power, and the theoretical significance level of the combined statistic is derived. Extensive simulations have been performed, and results have shown that the procedure can reduce the number of SNPs required in later stages, with improved power to detect associations. The procedure has also been applied to a real data set from a genome-wide association study of the sporadic amyotrophic lateral sclerosis (ALS) disease, and an interesting set of candidate SNPs has been identified.

SUBMITTER: Li J 

PROVIDER: S-EPMC3326652 | biostudies-literature | 2008 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Prioritize and select SNPs for association studies with multi-stage designs.

Li Jing J  

Journal of computational biology : a journal of computational molecular cell biology 20080401 3


Large-scale whole genome association studies are increasingly common, due in large part to recent advances in genotyping technology. With this change in paradigm for genetic studies of complex diseases, it is vital to develop valid, powerful, and efficient statistical tools and approaches to evaluate such data. Despite a dramatic drop in genotyping costs, it is still expensive to genotype thousands of individuals for hundreds of thousands single nucleotide polymorphisms (SNPs) for large-scale wh  ...[more]

Similar Datasets

| S-EPMC2868915 | biostudies-literature
| S-EPMC4143728 | biostudies-literature
| S-EPMC7612189 | biostudies-literature
| S-EPMC6168085 | biostudies-literature
| S-EPMC3006123 | biostudies-literature
| S-EPMC1959193 | biostudies-literature
| S-EPMC3168369 | biostudies-literature
| S-EPMC2896195 | biostudies-literature
2012-01-11 | E-GEOD-34945 | biostudies-arrayexpress
2012-01-11 | GSE34945 | GEO