Unknown

Dataset Information

0

Generating samples for association studies based on HapMap data.


ABSTRACT: With the completion of the HapMap project, a variety of computational algorithms and tools have been proposed for haplotype inference, tag SNP selection and genome-wide association studies. Simulated data are commonly used in evaluating these new developed approaches. In addition to simulations based on population models, empirical data generated by perturbing real data, has also been used because it may inherit specific properties from real data. However, there is no tool that is publicly available to generate large scale simulated variation data by taking into account knowledge from the HapMap project.A computer program (gs) was developed to quickly generate a large number of samples based on real data that are useful for a variety of purposes, including evaluating methods for haplotype inference, tag SNP selection and association studies. Two approaches have been implemented to generate dense SNP haplotype/genotype data that share similar local linkage disequilibrium (LD) patterns as those in human populations. The first approach takes haplotype pairs from samples as inputs, and the second approach takes patterns of haplotype block structures as inputs. Both quantitative and qualitative traits have been incorporated in the program. Phenotypes are generated based on a disease model, or based on the effect of a quantitative trait nucleotide, both of which can be specified by users. In addition to single-locus disease models, two-locus disease models have also been implemented that can incorporate any degree of epistasis. Users are allowed to specify all nine parameters in a 3 x 3 penetrance table. For several commonly used two-locus disease models, the program can automatically calculate penetrances based on the population prevalence and marginal effects of a disease that users can conveniently specify.The program gs can effectively generate large scale genetic and phenotypic variation data that can be used for evaluating new developed approaches. It is freely available from the authors' web site at http://www.eecs.case.edu/~jxl175/gs.html.

SUBMITTER: Li J 

PROVIDER: S-EPMC2375120 | biostudies-literature | 2008 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Generating samples for association studies based on HapMap data.

Li Jing J   Chen Yixuan Y  

BMC bioinformatics 20080124


<h4>Background</h4>With the completion of the HapMap project, a variety of computational algorithms and tools have been proposed for haplotype inference, tag SNP selection and genome-wide association studies. Simulated data are commonly used in evaluating these new developed approaches. In addition to simulations based on population models, empirical data generated by perturbing real data, has also been used because it may inherit specific properties from real data. However, there is no tool tha  ...[more]

Similar Datasets

| S-EPMC2478729 | biostudies-literature
| S-EPMC1852710 | biostudies-other
| S-EPMC2928027 | biostudies-literature
| S-EPMC2649532 | biostudies-literature
| S-EPMC4041108 | biostudies-literature
| S-EPMC2665689 | biostudies-literature
| S-EPMC5542532 | biostudies-other
| S-EPMC3369696 | biostudies-literature
| S-EPMC3990764 | biostudies-literature
| S-EPMC4629477 | biostudies-literature