Dataset Information

Simulation-assisted machine learning.

ABSTRACT:

Motivation

In a predictive modeling setting, if sufficient details of the system behavior are known, one can build and use a simulation for making predictions. When sufficient system details are not known, one typically turns to machine learning, which builds a black-box model of the system using a large dataset of input sample features and outputs. We consider a setting which is between these two extremes: some details of the system mechanics are known but not enough for creating simulations that can be used to make high quality predictions. In this context we propose using approximate simulations to build a kernel for use in kernelized machine learning methods, such as support vector machines. The results of multiple simulations (under various uncertainty scenarios) are used to compute similarity measures between every pair of samples: sample pairs are given a high similarity score if they behave similarly under a wide range of simulation parameters. These similarity values, rather than the original high dimensional feature data, are used to build the kernel.

Results

We demonstrate and explore the simulation-based kernel (SimKern) concept using four synthetic complex systems-three biologically inspired models and one network flow optimization model. We show that, when the number of training samples is small compared to the number of features, the SimKern approach dominates over no-prior-knowledge methods. This approach should be applicable in all disciplines where predictive models are sought and informative yet approximate simulations are available.

Availability and implementation

The Python SimKern software, the demonstration models (in MATLAB, R), and the datasets are available at https://github.com/davidcraft/SimKern.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Deist TM

PROVIDER: S-EPMC6792064 | biostudies-literature | 2019 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Simulation-assisted machine learning.

Deist Timo M TM Patti Andrew A Wang Zhaoqi Z Krane David D Sorenson Taylor T Craft David D

Bioinformatics (Oxford, England) 20191001 20

<h4>Motivation</h4>In a predictive modeling setting, if sufficient details of the system behavior are known, one can build and use a simulation for making predictions. When sufficient system details are not known, one typically turns to machine learning, which builds a black-box model of the system using a large dataset of input sample features and outputs. We consider a setting which is between these two extremes: some details of the system mechanics are known but not enough for creating simula ...[more]

PMID: 30903692

Similar Datasets

Project description:BackgroundRecently, the dental age estimation method developed by Cameriere has been widely recognized and accepted. Although machine learning (ML) methods can improve the accuracy of dental age estimation, no machine learning research exists on the use of the Cameriere dental age estimation method, making this research innovative and meaningful.AimThe purpose of this research is to use 7 lower left permanent teeth and three models [random forest (RF), support vector machine (SVM), and linear regression (LR)] based on the Cameriere method to predict children's dental age, and compare with the Cameriere age estimation.Subjects and methodsThis was a retrospective study that collected and analyzed orthopantomograms of 748 children (356 females and 392 males) aged 5-13 years. Data were randomly divided into training and test datasets in an 80-20% proportion for the ML algorithms. The procedure, starting with randomly creating new training and test datasets, was repeated 20 times. 7 permanent developing teeth on the left mandible (except wisdom teeth) were recorded using the Cameriere method. Then, the traditional Cameriere formula and three models (RF, SVM, and LR) were used to estimate the dental age. The age prediction accuracy was measured by five indicators: the coefficient of determination (R2), mean error (ME), root mean square error (RMSE), mean square error (MSE), and mean absolute error (MAE).ResultsThe research showed that the ML models have better accuracy than the traditional Cameriere formula. The ME, MAE, MSE, and RMSE values of the SVM model (0.004, 0.489, 0.392, and 0.625, respectively) and the RF model (- 0.004, 0.495, 0.389, and 0.623, respectively) were lower with the highest accuracy. In contrast, the ME, MAE, MSE and RMSE of the European Cameriere formula were 0.592, 0.846, 0.755, and 0.869, respectively, and those of the Chinese Cameriere formula were 0.748, 0.812, 0.890 and 0.943, respectively.ConclusionsCompared to the Cameriere formula, ML methods based on the Cameriere's maturation stages were more accurate in estimating dental age. These results support the use of ML algorithms instead of the traditional Cameriere formula.

Dataset Information

Simulation-assisted machine learning.

Motivation

Results

Availability and implementation

Supplementary information

Publications

Simulation-assisted machine learning.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets