Statistical Impact of Sample Size and Imbalance on Multivariate Analysis in silico and A Case Study in the UK Biobank.
Ontology highlight
ABSTRACT: Large-scale biobank cohorts coupled with electronic health records offer unprecedented opportunities to study genotype-phenotype relationships. Genome-wide association studies uncovered disease-associated loci through univariate methods, with the focus on one trait at a time. With genetic variants being identifiedfor thousands of traits, researchers found that 90% of human genetic loci are associated with more than one trait, highlighting the ubiquity of pleiotropy. Recently, multivariate methods have been proposed to effectively identify pleiotropy. However, the statistical performance in natural biomedical data, which often have unbalanced case-control sample sizes, is largely known. In this work, we designed 21 scenarios of real-data informed simulations to thoroughly evaluate the statistical characteristics of univariate and multivariate methods. Our results can serve as a reference guide for the application of multivariate methods. We also investigated potential pleiotropy across type II diabetes, Alzheimer's disease, atherosclerosis of arteries, depression, and atherosclerotic heart disease in the UK Biobank.
SUBMITTER: Zhang X
PROVIDER: S-EPMC8075427 | biostudies-literature |
REPOSITORIES: biostudies-literature
ACCESS DATA