Dataset Information

Prioritize and select SNPs for association studies with multi-stage designs.

ABSTRACT: Large-scale whole genome association studies are increasingly common, due in large part to recent advances in genotyping technology. With this change in paradigm for genetic studies of complex diseases, it is vital to develop valid, powerful, and efficient statistical tools and approaches to evaluate such data. Despite a dramatic drop in genotyping costs, it is still expensive to genotype thousands of individuals for hundreds of thousands single nucleotide polymorphisms (SNPs) for large-scale whole genome association studies. A multi-stage (or two-stage) design has been a promising alternative: in the first stage, only a fraction of samples are genotyped and tested using a dense set of SNPs, and only a small subset of markers that show moderate associations with the disease will be genotyped in later stages. Multi-stage designs have also been used in candidate gene association studies, usually in regions that have shown strong signals by linkage studies. To decide which set of SNPs to be genotyped in the next stage, a common practice is to utilize a simple test (such as a chi2 test for case-control data) and a liberal significance level without corrections for multiple testing, to ensure that no true signals will be filtered out. In this paper, I have developed a novel SNP selection procedure within the framework of multi-stage designs. Based on data from stage 1, the method explicitly explores correlations (linkage disequilibrium) among SNPs and their possible interactions in determining the disease phenotype. Comparing with a regular multi-stage design, the approach can select a much reduced set of SNPs with high discriminative power for later stages. Therefore, not only does it reduce the genotyping cost in later stages, it also increases the statistical power by reducing the number of tests. Combined analysis is proposed to further improve power, and the theoretical significance level of the combined statistic is derived. Extensive simulations have been performed, and results have shown that the procedure can reduce the number of SNPs required in later stages, with improved power to detect associations. The procedure has also been applied to a real data set from a genome-wide association study of the sporadic amyotrophic lateral sclerosis (ALS) disease, and an interesting set of candidate SNPs has been identified.

SUBMITTER: Li J

PROVIDER: S-EPMC3326652 | biostudies-literature | 2008 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Prioritize and select SNPs for association studies with multi-stage designs.

Li Jing J

Journal of computational biology : a journal of computational molecular cell biology 20080401 3

Large-scale whole genome association studies are increasingly common, due in large part to recent advances in genotyping technology. With this change in paradigm for genetic studies of complex diseases, it is vital to develop valid, powerful, and efficient statistical tools and approaches to evaluate such data. Despite a dramatic drop in genotyping costs, it is still expensive to genotype thousands of individuals for hundreds of thousands single nucleotide polymorphisms (SNPs) for large-scale wh ...[more]

PMID: 18352819

Similar Datasets

Project description:BackgroundAlthough genes play a key role in many complex diseases, the specific genes involved in most complex diseases remain largely unidentified. Their discovery will hinge on the identification of key sequence variants that are conclusively associated with disease. While much attention has been focused on variants in protein-coding DNA, variants in noncoding regions may also play many important roles in complex disease by altering gene regulation. Since the vast majority of noncoding genomic sequence is of unknown function, this increases the challenge of identifying "functional" variants that cause disease. However, evolutionary conservation can be used as a guide to indicate regions of noncoding or coding DNA that are likely to have biological function, and thus may be more likely to harbor SNP variants with functional consequences. To help bias marker selection in favor of such variants, we devised a process that prioritizes annotated SNPs for genotyping studies based on their location within Multi-species Conserved Sequences (MCSs) and used this process to select SNPs in a region of linkage to a complex disease. This allowed us to evaluate the utility of the chosen SNPs for further association studies. Previously, a region of chromosome 1q43 was linked to Multiple Sclerosis (MS) in a genome-wide screen. We chose annotated SNPs in the region based on location within MCSs (termed MCS-SNPs). We then obtained genotypes for 478 MCS-SNPs in 989 individuals from MS families.ResultsAnalysis of our MCS-SNP genotypes from the 1q43 region and comparison to HapMap data confirmed that annotated SNPs in MCS regions are frequently polymorphic and show subtle signatures of selective pressure, consistent with previous reports of genome-wide variation in conserved regions. We also present an online tool that allows MCS data to be directly exported to the UCSC genome browser so that MCS-SNPs can be easily identified within genomic regions of interest.ConclusionOur results showed that MCS can easily be used to prioritize markers for follow-up and candidate gene association studies. We believe that this novel approach demonstrates a paradigm for expediting the search for genes contributing to complex diseases.

Project description:BackgroundMulti-arm multi-stage trials are an efficient, adaptive approach for testing many treatments simultaneously within one protocol. In settings where numbers of patients available to be entered into trials and resources might be limited, such as primary postpartum haemorrhage, it may be necessary to select a pre-specified subset of arms at interim stages even if they are all showing some promise against the control arm. This will put a limit on the maximum number of patients required and reduce the associated costs. Motivated by the World Health Organization Refractory HaEmorrhage Devices trial in postpartum haemorrhage, we explored the properties of such a selection design in a randomised phase III setting and compared it with other alternatives. The objectives are: (1) to investigate how the timing of treatment selection affects the operating characteristics; (2) to explore the use of an information-rich (continuous) intermediate outcome to select the best-performing arm, out of four treatment arms, compared with using the primary (binary) outcome for selection at the interim stage; and (3) to identify factors that can affect the efficiency of the design.MethodsWe conducted simulations based on the refractory haemorrhage devices multi-arm multi-stage selection trial to investigate the impact of the timing of treatment selection and applying an adaptive allocation ratio on the probability of correct selection, overall power and familywise type I error rate. Simulations were also conducted to explore how other design parameters will affect both the maximum sample size and trial timelines.ResultsThe results indicate that the overall power of the trial is bounded by the probability of 'correct' selection at the selection stage. The results showed that good operating characteristics are achieved if the treatment selection is conducted at around 17% of information time. Our results also showed that although randomising more patients to research arms before selection will increase the probability of selecting correctly, this will not increase the overall efficiency of the (selection) design compared with the fixed allocation ratio of 1:1 to all arms throughout.ConclusionsMulti-arm multi-stage selection designs are efficient and flexible with desirable operating characteristics. We give guidance on many aspects of these designs including selecting the intermediate outcome measure, the timing of treatment selection, and choosing the operating characteristics.

Dataset Information

Prioritize and select SNPs for association studies with multi-stage designs.

Publications

Prioritize and select SNPs for association studies with multi-stage designs.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets