Unknown

Dataset Information

0

DPI_CDF: druggable protein identifier using cascade deep forest.


ABSTRACT:

Background

Drug targets in living beings perform pivotal roles in the discovery of potential drugs. Conventional wet-lab characterization of drug targets is although accurate but generally expensive, slow, and resource intensive. Therefore, computational methods are highly desirable as an alternative to expedite the large-scale identification of druggable proteins (DPs); however, the existing in silico predictor's performance is still not satisfactory.

Methods

In this study, we developed a novel deep learning-based model DPI_CDF for predicting DPs based on protein sequence only. DPI_CDF utilizes evolutionary-based (i.e., histograms of oriented gradients for position-specific scoring matrix), physiochemical-based (i.e., component protein sequence representation), and compositional-based (i.e., normalized qualitative characteristic) properties of protein sequence to generate features. Then a hierarchical deep forest model fuses these three encoding schemes to build the proposed model DPI_CDF.

Results

The empirical outcomes on 10-fold cross-validation demonstrate that the proposed model achieved 99.13 % accuracy and 0.982 of Matthew's-correlation-coefficient (MCC) on the training dataset. The generalization power of the trained model is further examined on an independent dataset and achieved 95.01% of maximum accuracy and 0.900 MCC. When compared to current state-of-the-art methods, DPI_CDF improves in terms of accuracy by 4.27% and 4.31% on training and testing datasets, respectively. We believe, DPI_CDF will support the research community to identify druggable proteins and escalate the drug discovery process.

Availability

The benchmark datasets and source codes are available in GitHub: http://github.com/Muhammad-Arif-NUST/DPI_CDF .

SUBMITTER: Arif M 

PROVIDER: S-EPMC11334562 | biostudies-literature | 2024 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

DPI_CDF: druggable protein identifier using cascade deep forest.

Arif Muhammad M   Fang Ge G   Ghulam Ali A   Musleh Saleh S   Alam Tanvir T  

BMC bioinformatics 20240405 1


<h4>Background</h4>Drug targets in living beings perform pivotal roles in the discovery of potential drugs. Conventional wet-lab characterization of drug targets is although accurate but generally expensive, slow, and resource intensive. Therefore, computational methods are highly desirable as an alternative to expedite the large-scale identification of druggable proteins (DPs); however, the existing in silico predictor's performance is still not satisfactory.<h4>Methods</h4>In this study, we de  ...[more]

Similar Datasets

| S-EPMC8460650 | biostudies-literature
| S-EPMC7591901 | biostudies-literature
| S-EPMC9369210 | biostudies-literature
| S-EPMC5167203 | biostudies-literature
| S-EPMC5889581 | biostudies-literature
2020-08-17 | GSE156313 | GEO
| S-EPMC11493656 | biostudies-literature
| S-EPMC11214090 | biostudies-literature
| S-EPMC10728196 | biostudies-literature
| S-EPMC4498304 | biostudies-literature