Dataset Information

Ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying CNS drugs with high prediction power.

ABSTRACT: The trade-off between a machine learning (ML) and deep learning (DL) model's predictability and its interpretability has been a rising concern in central nervous system-related quantitative structure-activity relationship (CNS-QSAR) analysis. Many state-of-the-art predictive modeling failed to provide structural insights due to their black box-like nature. Lack of interpretability and further to provide easy simple rules would be challenging for CNS-QSAR models. To address these issues, we develop a protocol to combine the power of ML and DL to generate a set of simple rules that are easy to interpret with high prediction power. A data set of 940 market drugs (315 CNS-active, 625 CNS-inactive) with support vector machine and graph convolutional network algorithms were used. Individual ML/DL modeling methods were also constructed for comparison. The performance of these models was evaluated using an additional external dataset of 117 market drugs (42 CNS-active, 75 CNS-inactive). Fingerprint-split validation was adopted to ensure model stringency and generalizability. The resulting novel hybrid ensemble model outperformed other constituent traditional QSAR models with an accuracy of 0.96 and an F1 score of 0.95. With the power of the interpretability provided with this protocol, our model laid down a set of simple physicochemical rules to determine whether a compound can be a CNS drug using six sub-structural features. These rules displayed higher classification ability than classical guidelines, with higher specificity and more mechanistic insights than just for blood-brain barrier permeability. This hybrid protocol can potentially be used for other drug property predictions.

SUBMITTER: Yu TH

PROVIDER: S-EPMC8769704 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:BackgroundNowadays doctors and radiologists are overwhelmed with a huge amount of work. This led to the effort to design different Computer-Aided Diagnosis systems (CAD system), with the aim of accomplishing a faster and more accurate diagnosis. The current development of deep learning is a big opportunity for the development of new CADs. In this paper, we propose a novel architecture for a convolutional neural network (CNN) ensemble for classifying chest X-ray (CRX) images into four classes: viral Pneumonia, Tuberculosis, COVID-19, and Healthy. Although Computed tomography (CT) is the best way to detect and diagnoses pulmonary issues, CT is more expensive than CRX. Furthermore, CRX is commonly the first step in the diagnosis, so it's very important to be accurate in the early stages of diagnosis and treatment.ResultsWe applied the transfer learning technique and data augmentation to all CNNs for obtaining better performance. We have designed and evaluated two different CNN-ensembles: Stacking and Voting. This system is ready to be applied in a CAD system to automated diagnosis such a second or previous opinion before the doctors or radiology's. Our results show a great improvement, 99% accuracy of the Stacking Ensemble and 98% of accuracy for the the Voting Ensemble.ConclusionsTo minimize missclassifications, we included six different base CNN models in our architecture (VGG16, VGG19, InceptionV3, ResNet101V2, DenseNet121 and CheXnet) and it could be extended to any number as well as we expect extend the number of diseases to detected. The proposed method has been validated using a large dataset created by mixing several public datasets with different image sizes and quality. As we demonstrate in the evaluation carried out, we reach better results and generalization compared with previous works. In addition, we make a first approach to explainable deep learning with the objective of providing professionals more information that may be valuable when evaluating CRXs.

Dataset Information

Ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying CNS drugs with high prediction power.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets