Model-Based Sensitivity Analysis of Nondestructive Testing Systems Using Machine Learning Algorithms
Ontology highlight
ABSTRACT: Model-based sensitivity analysis is crucial in quantifying which input variability parameter is important for nondestructive testing (NDT) systems. In this work, neural networks (NN) and convolutional NN (CNN) are shown to be computationally efficient at making model prediction for NDT systems, when compared to models such as polynomial chaos expansions, Kriging and polynomial chaos Kriging (PC-Kriging). Three different ultrasonic benchmark cases are considered. NN outperform these three models for all the cases, while CNN outperformed these three models for two of the three cases. For the third case, it performed as well as PC-Kriging. NN required 48, 56 and 35 high-fidelity model evaluations, respectively, for the three cases to reach within
Project description:Background:Epilepsy is a disorder that can manifest as abnormalities in neurological or physical function. Stress cardiomyopathy is closely associated with neurological stimulation. However, the mechanisms underlying the interrelationship between epilepsy and stress cardiomyopathy are unclear. This paper aims to explore the genetic features and potential molecular mechanisms shared in epilepsy and stress cardiomyopathy. Methods:By analyzing the epilepsy dataset and stress cardiomyopathy dataset separately, the intersection of the two disease co-expressed differential genes is obtained, the co-expressed differential genes reveal the biological functions, the network is constructed, and the core modules are identified to reveal the interaction mechanism, the co-expressed genes with diagnostic validity are screened by machine learning algorithms, and the co-expressed genes are validated in parallel on the epilepsy single-cell data and the stress cardiomyopathy rat model. Results: Epilepsy causes stress cardiomyopathy, and its key pathways are Complement and coagulation cascades, HIF-1 signaling pathway, its key co-expressed genes include SPOCK2, CTSZ, HLA-DMB, ALDOA, SFRP1, ERBB3.The key immune cell subpopulations localized by single-cell data are the T_cells subgroup, Microglia subgroup, Macrophage subgroup, Astrocyte subgroup, and Oligodendrocytes subgroup. Conclusion: We believe epilepsy causing stress cardiomyopathy results from a multi-gene, multi-pathway combination. We identified the core co-expressed genes (SPOCK2, CTSZ, HLA-DMB, ALDOA, SFRP1, ERBB3) and the pathways that function in them (Complement and coagulation cascades, HIF-1 signaling pathway,JAK-STAT signaling pathway), and finally localized their key cellular subgroups(T_cells subgroup, Microglia subgroup, Macrophage subgroup, Astrocyte subgroup,and Oligodendrocytes subgroup). Also, combining cell subpopulations with hypercoagulability as well as sympathetic excitation further narrowed the cell subpopulations of related functions.
Project description:Rising global population and climate change realities dictate that agricultural productivity must be accelerated. Results from current traditional research approaches are difficult to extrapolate to all possible fields because they are dependent on specific soil types, weather conditions, and background management combinations that are not applicable nor translatable to all farms. A method that accurately evaluates the effectiveness of infinite cropping system interactions (involving multiple management practices) to increase maize and soybean yield across the US does not exist. Here, we utilize extensive databases and artificial intelligence algorithms and show that complex interactions, which cannot be evaluated in replicated trials, are associated with large crop yield variability and thus, potential for substantial yield increases. Our approach can accelerate agricultural research, identify sustainable practices, and help overcome future food demands.
Project description:ObjectiveTo develop a predictive model of incidence of traumatic spinal cord injury (TSCI).MethodsThe data for training the model included both the incidence data and the covariates. The incidence data were extracted from systematic reviews and the covariates were extracted from data available in the international road federation database. Then the feature processing measures were taken. First we defined a hyper-parameter, missing-value threshold, in order to eliminate features that exceed this threshold. To tackle the problem of overfitting of model we determined the Pearson correlation of features and excluded those with more than 0.7 correlation. After feature selection three different models including simple linear regression, support vector regression, and multi-layer perceptron were examined to fit the purposes of this study. Finally, we evaluated the model based on three standard metrics: Mean Absolute Error, Root Mean Square Error, and R2.ResultsOur machine-learning based model could predict the incidence rate of TSCI with the mean absolute error of 4.66. Our model found "Vehicles in use, Total vehicles/Km of roads", "Injury accidents/100 Million Veh-Km", "Vehicles in use, Vans, Pick-ups, Lorries, Road Tractors", "Inland surface Passengers Transport (Mio Passenger-Km), Rail", and "% paved" as top predictors of transport-related TSCI (TRTSCI).ConclusionsOur model is proved to have a high accuracy to predict the incidence rate of TSCI for countries, especially where the main etiology of TSCI is related to road traffic injuries. Using this model, we can help the policymakers for resource allocation and evaluation of preventive measures.
Project description:Objective The aging phenomenon has an increasing trend worldwide which caused the emergence of the successful aging (SA)1 concept. It is believed that the SA prediction model can increase the quality of life (QoL)2 in the elderly by decreasing physical and mental problems and enhancing their social participation. Most previous studies noted that physical and mental disorders affected the QoL in the elderly but didn't pay much attention to the social factors in this respect. Our study aimed to build a prediction model for SA based on the physical, mental, and specially more social factors affecting SA. Methods The 975 cases related to SA and non-SA of the elderly were investigated in this study. We used the univariate analysis to determine the best factors affecting the SA. AB3, XG-Boost J-48, RF4, artificial neural network5, support vector machine6, and NB7 algorithms were used for building the prediction models. To get the best model predicting the SA, we compared them using positive predictive value (PPV)8, negative predictive value (NPV)9, sensitivity, specificity, accuracy, F-measure, and area under the receiver operator characteristics curve (AUC). Results Comparing the machine learning10 model's performance showed that the random forest (RF) model with PPV = 90.96%, NPV = 99.21%, sensitivity = 97.48%, specificity = 97.14%, accuracy = 97.05%, F-score = 97.31%, AUC = 0.975 is the best model for predicting the SA. Conclusions Using prediction models can increase the QoL in the elderly and consequently reduce the economic cost for people and societies. The RF can be considered an optimal model for predicting SA in the elderly.
Project description:We propose a novel method that predicts binding of G-protein coupled receptors (GPCRs) and ligands. The proposed method uses hub and cycle structures of ligands and amino acid motif sequences of GPCRs, rather than the 3D structure of a receptor or similarity of receptors or ligands. The experimental results show that these new features can be effective in predicting GPCR-ligand binding (average area under the curve [AUC] of 0.944), because they are thought to include hidden properties of good ligand-receptor binding. Using the proposed method, we were able to identify novel ligand-GPCR bindings, some of which are supported by several studies.
Project description:In this work, plasma samples of 5 metabolic syndrome patients and 5 healthy volunteers were collected. Then, high-throughput RNA sequencing was performed to detect the expression of plasma coding RNA.
Project description:Alzheimer's disease (AD) has its onset many decades before dementia develops, and work is ongoing to characterise individuals at risk of decline on the basis of early detection through biomarker and cognitive testing as well as the presence/absence of identified risk factors. Risk prediction models for AD based on various computational approaches, including machine learning, are being developed with promising results. However, these approaches have been criticised as they are unable to generalise due to over-reliance on one data source, poor internal and external validations, and lack of understanding of prediction models, thereby limiting the clinical utility of these prediction models. We propose a framework that employs a transfer-learning paradigm with ensemble learning algorithms to develop explainable personalised risk prediction models for dementia. Our prediction models, known as source models, are initially trained and tested using a publicly available dataset (n = 84,856, mean age = 69 years) with 14 years of follow-up samples to predict the individual risk of developing dementia. The decision boundaries of the best source model are further updated by using an alternative dataset from a different and much younger population (n = 473, mean age = 52 years) to obtain an additional prediction model known as the target model. We further apply the SHapely Additive exPlanation (SHAP) algorithm to visualise the risk factors responsible for the prediction at both population and individual levels. The best source model achieves a geometric accuracy of 87%, specificity of 99%, and sensitivity of 76%. In comparison to a baseline model, our target model achieves better performance across several performance metrics, within an increase in geometric accuracy of 16.9%, specificity of 2.7%, and sensitivity of 19.1%, an area under the receiver operating curve (AUROC) of 11% and a transfer learning efficacy rate of 20.6%. The strength of our approach is the large sample size used in training the source model, transferring and applying the "knowledge" to another dataset from a different and undiagnosed population for the early detection and prediction of dementia risk, and the ability to visualise the interaction of the risk factors that drive the prediction. This approach has direct clinical utility.
Project description:The relationships between the fatigue crack growth rate ( d a / d N ) and stress intensity factor range ( Δ K ) are not always linear even in the Paris region. The stress ratio effects on fatigue crack growth rate are diverse in different materials. However, most existing fatigue crack growth models cannot handle these nonlinearities appropriately. The machine learning method provides a flexible approach to the modeling of fatigue crack growth because of its excellent nonlinear approximation and multivariable learning ability. In this paper, a fatigue crack growth calculation method is proposed based on three different machine learning algorithms (MLAs): extreme learning machine (ELM), radial basis function network (RBFN) and genetic algorithms optimized back propagation network (GABP). The MLA based method is validated using testing data of different materials. The three MLAs are compared with each other as well as the classical two-parameter model ( K * approach). The results show that the predictions of MLAs are superior to those of K * approach in accuracy and effectiveness, and the ELM based algorithms show overall the best agreement with the experimental data out of the three MLAs, for its global optimization and extrapolation ability.
Project description:Identification of medical conditions using claims data is generally conducted with algorithms based on subject-matter knowledge. However, these claims-based algorithms (CBAs) are highly dependent on the knowledge level and not necessarily optimized for target conditions. We investigated whether machine learning methods can supplement researchers' knowledge of target conditions in building CBAs. Retrospective cohort study using a claims database combined with annual health check-up results of employees' health insurance programs for fiscal year 2016-17 in Japan (study population for hypertension, N = 631,289; diabetes, N = 152,368; dyslipidemia, N = 614,434). We constructed CBAs with logistic regression, k-nearest neighbor, support vector machine, penalized logistic regression, tree-based model, and neural network for identifying patients with three common chronic conditions: hypertension, diabetes, and dyslipidemia. We then compared their association measures using a completely hold-out test set (25% of the study population). Among the test cohorts of 157,822, 38,092, and 153,608 enrollees for hypertension, diabetes, and dyslipidemia, 25.4%, 8.4%, and 38.7% of them had a diagnosis of the corresponding condition. The areas under the receiver operating characteristic curve (AUCs) of the logistic regression with/without subject-matter knowledge about the target condition were .923/.921 for hypertension, .957/.938 for diabetes, and .739/.747 for dyslipidemia. The logistic lasso, logistic elastic-net, and tree-based methods yielded AUCs comparable to those of the logistic regression with subject-matter knowledge: .923-.931 for hypertension; .958-.966 for diabetes; .747-.773 for dyslipidemia. We found that machine learning methods can attain AUCs comparable to the conventional knowledge-based method in building CBAs.
Project description:In order to limit the spread of the novel betacoronavirus (SARS-CoV-2), it is necessary to detect positive cases as soon as possible and isolate them. For this purpose, machine-learning algorithms, as a field of artificial intelligence, have been recognized as a promising tool. The aim of this study was to assess the utility of the most common machine-learning algorithms in the rapid triage of children with suspected COVID-19 using easily accessible and inexpensive laboratory parameters. A cross-sectional study was conducted on 566 children treated for respiratory diseases: 280 children with PCR-confirmed SARS-CoV-2 infection and 286 children with respiratory symptoms who were SARS-CoV-2 PCR-negative (control group). Six machine-learning algorithms, based on the blood laboratory data, were tested: random forest, support vector machine, linear discriminant analysis, artificial neural network, k-nearest neighbors, and decision tree. The training set was validated through stratified cross-validation, while the performance of each algorithm was confirmed by an independent test set. Random forest and support vector machine models demonstrated the highest accuracy of 85% and 82.1%, respectively. The models demonstrated better sensitivity than specificity and better negative predictive value than positive predictive value. The F1 score was higher for the random forest than for the support vector machine model, 85.2% and 82.3%, respectively. This study might have significant clinical applications, helping healthcare providers identify children with COVID-19 in the early stage, prior to PCR and/or antigen testing. Additionally, machine-learning algorithms could improve overall testing efficiency with no extra costs for the healthcare facility.