Dataset Information

Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study.

ABSTRACT: With the development of data mining, machine learning offers opportunities to improve discrimination by analyzing complex interactions among massive variables. To test the ability of machine learning algorithms for predicting risk of type 2 diabetes mellitus (T2DM) in a rural Chinese population, we focus on a total of 36,652 eligible participants from the Henan Rural Cohort Study. Risk assessment models for T2DM were developed using six machine learning algorithms, including logistic regression (LR), classification and regression tree (CART), artificial neural networks (ANN), support vector machine (SVM), random forest (RF) and gradient boosting machine (GBM). The model performance was measured in an area under the receiver operating characteristic curve, sensitivity, specificity, positive predictive value, negative predictive value and area under precision recall curve. The importance of variables was identified based on each classifier and the shapley additive explanations approach. Using all available variables, all models for predicting risk of T2DM demonstrated strong predictive performance, with AUCs ranging between 0.811 and 0.872 using laboratory data and from 0.767 to 0.817 without laboratory data. Among them, the GBM model performed best (AUC: 0.872 with laboratory data and 0.817 without laboratory data). Performance of models plateaued when introduced 30 variables to each model except CART model. Among the top-10 variables across all methods were sweet flavor, urine glucose, age, heart rate, creatinine, waist circumference, uric acid, pulse pressure, insulin, and hypertension. New important risk factors (urinary indicators, sweet flavor) were not found in previous risk prediction methods, but determined by machine learning in our study. Through the results, machine learning methods showed competence in predicting risk of T2DM, leading to greater insights on disease risk factors with no priori assumption of causality.

SUBMITTER: Zhang L

PROVIDER: S-EPMC7064542 | biostudies-literature | 2020 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study.

Zhang Liying L Wang Yikang Y Niu Miaomiao M Wang Chongjian C Wang Zhenfei Z

Scientific reports 20200310 1

With the development of data mining, machine learning offers opportunities to improve discrimination by analyzing complex interactions among massive variables. To test the ability of machine learning algorithms for predicting risk of type 2 diabetes mellitus (T2DM) in a rural Chinese population, we focus on a total of 36,652 eligible participants from the Henan Rural Cohort Study. Risk assessment models for T2DM were developed using six machine learning algorithms, including logistic regression ...[more]

PMID: 32157171

Similar Datasets

Project description:OBJECTIVE:The aims of this study were to describe distributions of the prevalence of osteopenia and osteoporosis and identify the potential risk factors by gender in a Chinese rural population. DESIGN:A cross-sectional survey. SETTING AND PARTICIPANTS:A total of 8475 participants (18-79 years) were obtained from the Henan Rural Cohort Study. Bone mineral density (BMD) of the calcaneus for each individual was measured by ultrasonic bone density apparatus. Logistic regression models were used to evaluate associations of potential risk factors with prevalence of osteopenia and osteoporosis. Furthermore, a meta-analysis of prevalence of osteoporosis which included eight studies was conducted to confirm this study results. RESULTS:The mean of BMD were 0.42?and 0.32?g/cm2 for men with osteopenia and osteoporosis (p<0.001), as well as 0.40 and 0.30?g/cm2 (p<0.001) for women with osteopenia and osteoporosis, respectively. The overall age-standardised prevalence of osteopenia and osteoporosis were 42.09% and 11.76% in all participants. The age-standardised prevalence of osteopenia in men (45.98%) was significantly higher than that in women (39.73%), whereas the age-standardised prevalence of osteoporosis in men (7.82%) was lower than that in women (14.38%). Meta-analysis results displayed pooled prevalence of osteoporosis of 18.0% (10.1%-25.8%) in total sample, 7.7% (5.7%-9.7%) in men and 22.4% (17.1%-27.6%) in women. Multivariable logistic regression models showed that ageing, women, low education level or income, drinking or underweight was related to increased risk for osteopenia or osteoporosis. CONCLUSIONS:About one-sixth of the participants suffered osteoporosis in rural China, and the prevalence in women was higher than men. Although the results were lower than that of meta-analysis, osteoporosis still accounts for huge burden of disease in rural population due to limited medical service and lack of health risk awareness rather than urban area. TRIAL REGISTRATION NUMBER:Chinese Clinical Trial Registry (ChiCTR-OOC-15006699; Pre-results).

Dataset Information

Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study.

Publications

Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets