Unknown

Dataset Information

0

Robust SNP-based prediction of rheumatoid arthritis through machine-learning-optimized polygenic risk score.


ABSTRACT: The popular statistics-based Genome-wide association studies (GWAS) have provided deep insights into the field of complex disorder genetics. However, its clinical applicability to predict disease/trait outcomes remains unclear as statistical models are not designed to make predictions. This study employs statistics-free machine-learning (ML)-optimized polygenic risk score (PRS) to complement existing GWAS and bring the prediction of disease/trait outcomes closer to clinical application. Rheumatoid Arthritis (RA) was selected as a model disease to demonstrate the robustness of ML in disease prediction as RA is a prevalent chronic inflammatory joint disease with high mortality rates, affecting adults at the economic prime. Early identification of at-risk individuals may facilitate measures to mitigate the effects of the disease. This study employs a robust ML feature selection algorithm to identify single nucleotide polymorphisms (SNPs) that can predict RA from a set of training data comprising RA patients and population control samples. Thereafter, selected SNPs were evaluated for their predictive performances across 3 independent, unseen test datasets. The selected SNPs were subsequently used to generate PRS which was also evaluated for its predictive capacity as a sole feature. Through robust ML feature selection, 9 SNPs were found to be the minimum number of features for excellent predictive performance (AUC > 0.9) in 3 independent, unseen test datasets. PRS based on these 9 SNPs was significantly associated with (P < 1 × 10-16) and predictive (AUC > 0.9) of RA in the 3 unseen datasets. A RA ML-PRS calculator of these 9 SNPs was developed ( https://xistance.shinyapps.io/prs-ra/ ) to facilitate individualized clinical applicability. The majority of the predictive SNPs are protective, reside in non-coding regions, and are either predicted to be potentially functional SNPs (pfSNPs) or in high linkage disequilibrium (r2 > 0.8) with un-interrogated pfSNPs. These findings highlight the promise of this ML strategy to identify useful genetic features that can robustly predict disease and amenable to translation for clinical application.

SUBMITTER: Lim AJW 

PROVIDER: S-EPMC9903430 | biostudies-literature | 2023 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Robust SNP-based prediction of rheumatoid arthritis through machine-learning-optimized polygenic risk score.

Lim Ashley J W AJW   Tyniana C Tera CT   Lim Lee Jin LJ   Tan Justina Wei Lynn JWL   Koh Ee Tzun ET   Chong Samuel S SS   Khor Chiea Chuen CC   Leong Khai Pang KP   Lee Caroline G CG  

Journal of translational medicine 20230207 1


<h4>Background</h4>The popular statistics-based Genome-wide association studies (GWAS) have provided deep insights into the field of complex disorder genetics. However, its clinical applicability to predict disease/trait outcomes remains unclear as statistical models are not designed to make predictions. This study employs statistics-free machine-learning (ML)-optimized polygenic risk score (PRS) to complement existing GWAS and bring the prediction of disease/trait outcomes closer to clinical ap  ...[more]

Similar Datasets

| S-EPMC5627249 | biostudies-literature
| S-EPMC8666017 | biostudies-literature
2013-01-01 | E-GEOD-29210 | biostudies-arrayexpress
2024-12-27 | GSE246294 | GEO
| S-EPMC11348567 | biostudies-literature
| S-EPMC8501710 | biostudies-literature
| S-EPMC10665911 | biostudies-literature
| S-EPMC6334178 | biostudies-literature
| S-EPMC9068780 | biostudies-literature
| S-EPMC9236264 | biostudies-literature