Unknown

Dataset Information

0

A weighted U-statistic for genetic association analyses of sequencing data.


ABSTRACT: With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol.

SUBMITTER: Wei C 

PROVIDER: S-EPMC4236269 | biostudies-literature | 2014 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

A weighted U-statistic for genetic association analyses of sequencing data.

Wei Changshuai C   Li Ming M   He Zihuai Z   Vsevolozhskaya Olga O   Schaid Daniel J DJ   Lu Qing Q  

Genetic epidemiology 20141020 8


With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the e  ...[more]

Similar Datasets

| S-EPMC4899187 | biostudies-literature
| S-EPMC2633048 | biostudies-literature
| S-EPMC5054825 | biostudies-literature
| S-EPMC4143755 | biostudies-literature
| S-EPMC2387159 | biostudies-other
| S-EPMC7723344 | biostudies-literature
| S-EPMC4310867 | biostudies-other