Dataset Information

Building a genetic risk model for bipolar disorder from genome-wide association data with random forest algorithm.

ABSTRACT: A genetic risk score could be beneficial in assisting clinical diagnosis for complex diseases with high heritability. With large-scale genome-wide association (GWA) data, the current study constructed a genetic risk model with a machine learning approach for bipolar disorder (BPD). The GWA dataset of BPD from the Genetic Association Information Network was used as the training data for model construction, and the Systematic Treatment Enhancement Program (STEP) GWA data were used as the validation dataset. A random forest algorithm was applied for pre-filtered markers, and variable importance indices were assessed. 289 candidate markers were selected by random forest procedures with good discriminability; the area under the receiver operating characteristic curve was 0.944 (0.935-0.953) in the training set and 0.702 (0.681-0.723) in the STEP dataset. Using a score with the cutoff of 184, the sensitivity and specificity for BPD was 0.777 and 0.854, respectively. Pathway analyses revealed important biological pathways for identified genes. In conclusion, the present study identified informative genetic markers to differentiate BPD from healthy controls with acceptable discriminability in the validation dataset. In the future, diagnosis classification can be further improved by assessing more comprehensive clinical risk factors and jointly analysing them with genetic data in large samples.

SUBMITTER: Chuang LC

PROVIDER: S-EPMC5206749 | biostudies-literature | 2017 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Building a genetic risk model for bipolar disorder from genome-wide association data with random forest algorithm.

Chuang Li-Chung LC Kuo Po-Hsiu PH

Scientific reports 20170103

A genetic risk score could be beneficial in assisting clinical diagnosis for complex diseases with high heritability. With large-scale genome-wide association (GWA) data, the current study constructed a genetic risk model with a machine learning approach for bipolar disorder (BPD). The GWA dataset of BPD from the Genetic Association Information Network was used as the training data for model construction, and the Systematic Treatment Enhancement Program (STEP) GWA data were used as the validatio ...[more]

PMID: 28045094

Dataset Information

Building a genetic risk model for bipolar disorder from genome-wide association data with random forest algorithm.

Publications

Building a genetic risk model for bipolar disorder from genome-wide association data with random forest algorithm.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

A Random-Forest Based Algorithm for Prediction of Enhancers From Histone Modifications
2012-05-10 | GSE37858 | GEO

Meta-analysis of genome-wide association data of bipolar disorder and major depressive disorder.
| S-EPMC3883627 | biostudies-literature

TSLRF: Two-Stage Algorithm Based on Least Angle Regression and Random Forest in genome-wide association studies.
| S-EPMC6889171 | biostudies-literature

A Random-Forest Based Algorithm for Prediction of Enhancers From Histone Modifications
2012-05-09 | E-GEOD-37858 | biostudies-arrayexpress

The genetic association between personality and major depression or bipolar disorder. A polygenic score analysis using genome-wide association data.
| S-EPMC3309491 | biostudies-literature

Pathway-based analysis for genome-wide association study data of bipolar disorder provides new insights for genetic study.
| S-EPMC4656210 | biostudies-literature

tRForest: a novel random forest-based algorithm for tRNA-derived fragment target prediction
2022-05-16 | GSE189510 | GEO

Subtyping cognitive profiles in Autism Spectrum Disorder using a Functional Random Forest algorithm.
| S-EPMC5969914 | biostudies-literature

Estimating disease prevalence from drug utilization data using the Random Forest algorithm.
| S-EPMC6660107 | biostudies-literature

Exploiting SNP correlations within random forest for genome-wide association studies.
| S-EPMC3973686 | biostudies-literature