Dataset Information

Design of association studies with pooled or un-pooled next-generation sequencing data.

ABSTRACT: Most common hereditary diseases in humans are complex and multifactorial. Large-scale genome-wide association studies based on SNP genotyping have only identified a small fraction of the heritable variation of these diseases. One explanation may be that many rare variants (a minor allele frequency, MAF <5%), which are not included in the common genotyping platforms, may contribute substantially to the genetic variation of these diseases. Next-generation sequencing, which would allow the analysis of rare variants, is now becoming so cheap that it provides a viable alternative to SNP genotyping. In this paper, we present cost-effective protocols for using next-generation sequencing in association mapping studies based on pooled and un-pooled samples, and identify optimal designs with respect to total number of individuals, number of individuals per pool, and the sequencing coverage. We perform a small empirical study to evaluate the pooling variance in a realistic setting where pooling is combined with exon-capturing. To test for associations, we develop a likelihood ratio statistic that accounts for the high error rate of next-generation sequencing data. We also perform extensive simulations to determine the power and accuracy of this method. Overall, our findings suggest that with a fixed cost, sequencing many individuals at a more shallow depth with larger pool size achieves higher power than sequencing a small number of individuals in higher depth with smaller pool size, even in the presence of high error rates. Our results provide guidelines for researchers who are developing association mapping studies based on next-generation sequencing.

SUBMITTER: Kim SY

PROVIDER: S-EPMC5001557 | biostudies-literature | 2010 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Design of association studies with pooled or un-pooled next-generation sequencing data.

Kim Su Yeon SY Li Yingrui Y Guo Yiran Y Li Ruiqiang R Holmkvist Johan J Hansen Torben T Pedersen Oluf O Wang Jun J Nielsen Rasmus R

Genetic epidemiology 20100701 5

Most common hereditary diseases in humans are complex and multifactorial. Large-scale genome-wide association studies based on SNP genotyping have only identified a small fraction of the heritable variation of these diseases. One explanation may be that many rare variants (a minor allele frequency, MAF <5%), which are not included in the common genotyping platforms, may contribute substantially to the genetic variation of these diseases. Next-generation sequencing, which would allow the analysis ...[more]

PMID: 20552648

Dataset Information

Design of association studies with pooled or un-pooled next-generation sequencing data.

Publications

Design of association studies with pooled or un-pooled next-generation sequencing data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Big data challenges in bone research: genome-wide association studies and next-generation sequencing.
| S-EPMC4325556 | biostudies-other

Implication of next-generation sequencing on association studies.
| S-EPMC3148210 | biostudies-literature

Family-based association studies for next-generation sequencing.
| S-EPMC3370281 | biostudies-literature

Analysis and optimal design for association studies using next-generation sequencing with case-control pools.
| S-EPMC4139478 | biostudies-literature

Gene-set association tests for next-generation sequencing data.
| S-EPMC5013913 | biostudies-literature

Estimation of allele frequency and association mapping using next-generation sequencing data.
| S-EPMC3212839 | biostudies-other

SNPGenie: estimating evolutionary parameters to detect natural selection using pooled next-generation sequencing data.
| S-EPMC4757956 | biostudies-literature

NGSNGS: next-generation simulator for next-generation sequencing data.
| S-EPMC9891242 | biostudies-literature

Detecting selective sweeps from pooled next-generation sequencing samples.
| S-EPMC3424412 | biostudies-literature

PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals.
| S-EPMC3017084 | biostudies-literature