Dataset Information

Multiethnic polygenic risk scores improve risk prediction in diverse populations.

ABSTRACT: Methods for genetic risk prediction have been widely investigated in recent years. However, most available training data involves European samples, and it is currently unclear how to accurately predict disease risk in other populations. Previous studies have used either training data from European samples in large sample size or training data from the target population in small sample size, but not both. Here, we introduce a multiethnic polygenic risk score that combines training data from European samples and training data from the target population. We applied this approach to predict type 2 diabetes (T2D) in a Latino cohort using both publicly available European summary statistics in large sample size (N_eff = 40k) and Latino training data in small sample size (N_eff = 8k). Here, we attained a >70% relative improvement in prediction accuracy (from R² = 0.027 to 0.047) compared to methods that use only one source of training data, consistent with large relative improvements in simulations. We observed a systematically lower load of T2D risk alleles in Latino individuals with more European ancestry, which could be explained by polygenic selection in ancestral European and/or Native American populations. We predict T2D in a South Asian UK Biobank cohort using European (N_eff = 40k) and South Asian (N_eff = 16k) training data and attained a >70% relative improvement in prediction accuracy, and application to predict height in an African UK Biobank cohort using European (N = 113k) and African (N = 2k) training data attained a 30% relative improvement. Our work reduces the gap in polygenic risk prediction accuracy between European and non-European target populations.

SUBMITTER: Marquez-Luna C

PROVIDER: S-EPMC5726434 | biostudies-literature | 2017 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Multiethnic polygenic risk scores improve risk prediction in diverse populations.

Márquez-Luna Carla C Loh Po-Ru PR Price Alkes L AL

Genetic epidemiology 20171107 8

Methods for genetic risk prediction have been widely investigated in recent years. However, most available training data involves European samples, and it is currently unclear how to accurately predict disease risk in other populations. Previous studies have used either training data from European samples in large sample size or training data from the target population in small sample size, but not both. Here, we introduce a multiethnic polygenic risk score that combines training data from Europ ...[more]

PMID: 29110330

Similar Datasets

Project description:BackgroundCardiovascular diseases (CVD) are a major health concern in Africa. Improved identification and treatment of high-risk individuals can reduce adverse health outcomes. Current CVD risk calculators are largely unvalidated in African populations and overlook genetic factors. Polygenic scores (PGS) can enhance risk prediction by measuring genetic susceptibility to CVD, but their effectiveness in genetically diverse populations is limited by a European-ancestry bias. To address this, we developed models integrating genetic data and conventional risk factors to assess the risk of developing cardiometabolic outcomes in African populations.MethodsWe used summary statistics from a genome-wide association meta-analysis (n = 14,126) in African populations to derive novel genome-wide PGS for 14 cardiometabolic traits in an independent African target sample (Africa Wits-INDEPTH Partnership for Genomic Research (AWI-Gen), n = 10,603). Regression analyses assessed relationships between each PGS and corresponding cardiometabolic trait, and seven CVD outcomes (CVD, heart attack, stroke, diabetes mellitus, dyslipidaemia, hypertension, and obesity). The predictive utility of the genetic data was evaluated using elastic net models containing multiple PGS (MultiPGS) and reference-projected principal components of ancestry (PPCs). An integrated risk prediction model incorporating genetic and conventional risk factors was developed. Nested cross-validation was used when deriving elastic net models to enhance generalisability.ResultsOur African-specific PGS displayed significant but variable within- and cross- trait prediction (max.R2 = 6.8%, p = 1.86 × 10-173). Significantly associated PGS with dyslipidaemia included the PGS for total cholesterol (logOR = 0.210, SE = 0.022, p = 2.18 × 10-21) and low-density lipoprotein (logOR = - 0.141, SE = 0.022, p = 1.30 × 10-20); with hypertension, the systolic blood pressure PGS (logOR = 0.150, SE = 0.045, p = 8.34 × 10-4); and multiple PGS associated with obesity: body mass index (max. logOR = 0.131, SE = 0.031, p = 2.22 × 10-5), hip circumference (logOR = 0.122, SE = 0.029, p = 2.28 × 10-5), waist circumference (logOR = 0.013, SE = 0.098, p = 8.13 × 10-4) and weight (logOR = 0.103, SE = 0.029, p = 4.89 × 10-5). Elastic net models incorporating MultiPGS and PPCs significantly improved prediction over MultiPGS alone. Models including genetic data and conventional risk factors were more predictive than conventional risk models alone (dyslipidaemia: R2 increase = 2.6%, p = 4.45 × 10-12; hypertension: R2 increase = 2.6%, p = 2.37 × 10-13; obesity: R2 increase = 5.5%, 1.33 × 10-34).ConclusionsIn African populations, CVD and associated cardiometabolic trait prediction models can be improved by incorporating ancestry-aligned PGS and accounting for ancestry. Combining PGS with conventional risk factors further enhances prediction over traditional models based on conventional factors. Incorporating data from target populations can improve the generalisability of international predictive models for CVD and associated traits in African populations.

Dataset Information

Multiethnic polygenic risk scores improve risk prediction in diverse populations.

Publications

Multiethnic polygenic risk scores improve risk prediction in diverse populations.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets