Unknown

Dataset Information

0

MixWAS: An efficient distributed algorithm for mixed-outcomes genome-wide association studies.


ABSTRACT: Genome-wide association studies (GWAS) have been instrumental in identifying genetic associations for various diseases and traits. However, uncovering genetic underpinnings among traits beyond univariate phenotype associations remains a challenge. Multi-phenotype associations (MPA), or genetic pleiotropy, offer important insights into shared genes and pathways among traits, enhancing our understanding of genetic architectures of complex diseases. GWAS of biobank-linked electronic health record (EHR) data are increasingly being utilized to identify MPA among various traits and diseases. However, methodologies that can efficiently take advantage of distributed EHR to detect MPA are still lacking. Here, we introduce mixWAS, a novel algorithm that efficiently and losslessly integrates multiple EHRs via summary statistics, allowing the detection of MPA among mixed phenotypes while accounting for heterogeneities across EHRs. Simulations demonstrate that mixWAS outperforms the widely used MPA detection method, Phenome-wide association study (PheWAS), across diverse scenarios. Applying mixWAS to data from seven EHRs in the US, we identified 4,534 MPA among blood lipids, BMI, and circulatory diseases. Validation in an independent EHR data from UK confirmed 97.7% of the associations. mixWAS fundamentally improves the detection of MPA and is available as a free, open-source software.

SUBMITTER: Li R 

PROVIDER: S-EPMC10802662 | biostudies-literature | 2024 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

A One-Shot Lossless Algorithm for Cross-Cohort Learning in Mixed-Outcomes Analysis.

Li Ruowang R   Benz Luke L   Duan Rui R   Denny Joshua C JC   Hakonarson Hakon H   Mosley Jonathan D JD   Smoller Jordan W JW   Wei Wei-Qi WQ   Lumley Thomas T   Ritchie Marylyn D MD   Moore Jason H JH   Chen Yong Y  

medRxiv : the preprint server for health sciences 20241204


In cross-cohort studies, integrating diverse datasets, such as electronic health records (EHRs), is both essential and challenging due to cohort-specific variations, distributed data storage, and data privacy concerns. Traditional methods often require data pooling or complex data harmonization, which can reduce efficiency and limit the scope of cross-cohort learning. We introduce mixWAS, a one-shot, lossless algorithm that efficiently integrates distributed EHR datasets via summary statistics.  ...[more]

Similar Datasets

| S-EPMC3386377 | biostudies-literature
| S-EPMC4211878 | biostudies-literature
| S-EPMC3042187 | biostudies-literature
| S-EPMC3386481 | biostudies-literature
| S-EPMC8206161 | biostudies-literature
| S-EPMC7684894 | biostudies-literature
| S-EPMC4112407 | biostudies-literature
| S-EPMC2931336 | biostudies-literature
| S-EPMC4255454 | biostudies-literature
| S-EPMC2894516 | biostudies-literature