ABSTRACT: Background: Remodeling due to myocardial infarction (MI) significantly increases patient arrhythmic risk. Simulations using patient-specific models have shown promise in predicting personalized risk for arrhythmia. However, these are computationally- and time- intensive, hindering translation to clinical practice. Classical machine learning (ML) algorithms (such as K-nearest neighbors, Gaussian support vector machines, and decision trees) as well as neural network techniques, shown to increase prediction accuracy, can be used to predict occurrence of arrhythmia as predicted by simulations based solely on infarct and ventricular geometry. We present an initial combined image-based patient-specific in silico and machine learning methodology to assess risk for dangerous arrhythmia in post-infarct patients. Furthermore, we aim to demonstrate that simulation-supported data augmentation improves prediction models, combining patient data, computational simulation, and advanced statistical modeling, improving overall accuracy for arrhythmia risk assessment. Methods: MRI-based computational models were constructed from 30 patients 5 days post-MI (the “baseline” population). In order to assess the utility biophysical model-supported data augmentation for improving arrhythmia prediction, we augmented the virtual baseline patient population. Each patient ventricular and ischemic geometry in the baseline population was used to create a subfamily of geometric models, resulting in an expanded set of patient models (the “augmented” population). Arrhythmia induction was attempted via programmed stimulation at 17 sites for each virtual patient corresponding to AHA LV segments and simulation outcome, “arrhythmia,” or “no-arrhythmia,” were used as ground truth for subsequent statistical prediction (machine learning, ML) models. For each patient geometric model, we measured and used choice data features: the myocardial volume and ischemic volume, as well as the segment-specific myocardial volume and ischemia percentage, as input to ML algorithms. For classical ML techniques (ML), we trained k-nearest neighbors, support vector machine, logistic regression, xgboost, and decision tree models to predict the simulation outcome from these geometric features alone. To explore neural network ML techniques, we trained both a three - and a four-hidden layer multilayer perceptron feed forward neural networks (NN), again predicting simulation outcomes from these geometric features alone. ML and NN models were trained on 70% of randomly selected segments and the remaining 30% was used for validation for both baseline and augmented populations. Results: Stimulation in the baseline population (30 patient models) resulted in reentry in 21.8% of sites tested; in the augmented population (129 total patient models) reentry occurred in 13.0% of sites tested. ML and NN models ranged in mean accuracy from 0.83 to 0.86 for the baseline population, improving to 0.88 to 0.89 in all cases. Conclusion: Machine learning techniques, combined with patient-specific, image-based computational simulations, can provide key clinical insights with high accuracy rapidly and efficiently. In the case of sparse or missing patient data, simulation-supported data augmentation can be employed to further improve predictive results for patient benefit. This work paves the way for using data-driven simulations for prediction of dangerous arrhythmia in MI patients.