Unknown

Dataset Information

0

SPVec: A Word2vec-Inspired Feature Representation Method for Drug-Target Interaction Prediction.


ABSTRACT: Drug discovery is an academical and commercial process of global importance. Accurate identification of drug-target interactions (DTIs) can significantly facilitate the drug discovery process. Compared to the costly, labor-intensive and time-consuming experimental methods, machine learning (ML) plays an ever-increasingly important role in effective, efficient and high-throughput identification of DTIs. However, upstream feature extraction methods require tremendous human resources and expert insights, which limits the application of ML approaches. Inspired by the unsupervised representation learning methods like Word2vec, we here proposed SPVec, a novel way to automatically represent raw data such as SMILES strings and protein sequences into continuous, information-rich and lower-dimensional vectors, so as to avoid the sparseness and bit collisions from the cumbersomely manually extracted features. Visualization of SPVec nicely illustrated that the similar compounds or proteins occupy similar vector space, which indicated that SPVec not only encodes compound substructures or protein sequences efficiently, but also implicitly reveals some important biophysical and biochemical patterns. Compared with manually-designed features like MACCS fingerprints and amino acid composition (AAC), SPVec showed better performance with several state-of-art machine learning classifiers such as Gradient Boosting Decision Tree, Random Forest and Deep Neural Network on BindingDB. The performance and robustness of SPVec were also confirmed on independent test sets obtained from DrugBank database. Also, based on the whole DrugBank dataset, we predicted the possibilities of all unlabeled DTIs, where two of the top five predicted novel DTIs were supported by external evidences. These results indicated that SPVec can provide an effective and efficient way to discover reliable DTIs, which would be beneficial for drug reprofiling.

SUBMITTER: Zhang YF 

PROVIDER: S-EPMC6967417 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

altmetric image

Publications

SPVec: A Word2vec-Inspired Feature Representation Method for Drug-Target Interaction Prediction.

Zhang Yu-Fang YF   Wang Xiangeng X   Kaushik Aman Chandra AC   Chu Yanyi Y   Shan Xiaoqi X   Zhao Ming-Zhu MZ   Xu Qin Q   Wei Dong-Qing DQ  

Frontiers in chemistry 20200110


Drug discovery is an academical and commercial process of global importance. Accurate identification of drug-target interactions (DTIs) can significantly facilitate the drug discovery process. Compared to the costly, labor-intensive and time-consuming experimental methods, machine learning (ML) plays an ever-increasingly important role in effective, efficient and high-throughput identification of DTIs. However, upstream feature extraction methods require tremendous human resources and expert ins  ...[more]

Similar Datasets

| S-EPMC6928730 | biostudies-literature
| S-EPMC3646965 | biostudies-literature
| S-EPMC8098026 | biostudies-literature
| S-EPMC8414716 | biostudies-literature
| S-EPMC7436358 | biostudies-literature
| S-EPMC4752318 | biostudies-literature
| S-EPMC3722516 | biostudies-literature