Dataset Information

Multifaceted protein-protein interaction prediction based on Siamese residual RCNN.

ABSTRACT:

Motivation

Sequence-based protein-protein interaction (PPI) prediction represents a fundamental computational biology problem. To address this problem, extensive research efforts have been made to extract predefined features from the sequences. Based on these features, statistical algorithms are learned to classify the PPIs. However, such explicit features are usually costly to extract, and typically have limited coverage on the PPI information.

Results

We present an end-to-end framework, PIPR (Protein-Protein Interaction Prediction Based on Siamese Residual RCNN), for PPI predictions using only the protein sequences. PIPR incorporates a deep residual recurrent convolutional neural network in the Siamese architecture, which leverages both robust local features and contextualized information, which are significant for capturing the mutual influence of proteins sequences. PIPR relieves the data pre-processing efforts that are required by other systems, and generalizes well to different application scenarios. Experimental evaluations show that PIPR outperforms various state-of-the-art systems on the binary PPI prediction problem. Moreover, it shows a promising performance on more challenging problems of interaction type prediction and binding affinity estimation, where existing approaches fall short.

Availability and implementation

The implementation is available at https://github.com/muhaochen/seq_ppi.git.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Chen M

PROVIDER: S-EPMC6681469 | biostudies-literature | 2019 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Multifaceted protein-protein interaction prediction based on Siamese residual RCNN.

Chen Muhao M Ju Chelsea J-T CJ Zhou Guangyu G Chen Xuelu X Zhang Tianran T Chang Kai-Wei KW Zaniolo Carlo C Wang Wei W

Bioinformatics (Oxford, England) 20190701 14

<h4>Motivation</h4>Sequence-based protein-protein interaction (PPI) prediction represents a fundamental computational biology problem. To address this problem, extensive research efforts have been made to extract predefined features from the sequences. Based on these features, statistical algorithms are learned to classify the PPIs. However, such explicit features are usually costly to extract, and typically have limited coverage on the PPI information.<h4>Results</h4>We present an end-to-end fr ...[more]

PMID: 31510705

Similar Datasets

Project description:BackgroundProtein-protein interactions can be seen as a hierarchical process occurring at three related levels: proteins bind by means of specific domains, which in turn form interfaces through patches of residues. Detailed knowledge about which domains and residues are involved in a given interaction has extensive applications to biology, including better understanding of the binding process and more efficient drug/enzyme design. Alas, most current interaction prediction methods do not identify which parts of a protein actually instantiate an interaction. Furthermore, they also fail to leverage the hierarchical nature of the problem, ignoring otherwise useful information available at the lower levels; when they do, they do not generate predictions that are guaranteed to be consistent between levels.ResultsInspired by earlier ideas of Yip et al. (BMC Bioinformatics 10:241, 2009), in the present paper we view the problem as a multi-level learning task, with one task per level (proteins, domains and residues), and propose a machine learning method that collectively infers the binding state of all object pairs. Our method is based on Semantic Based Regularization (SBR), a flexible and theoretically sound machine learning framework that uses First Order Logic constraints to tie the learning tasks together. We introduce a set of biologically motivated rules that enforce consistent predictions between the hierarchy levels.ConclusionsWe study the empirical performance of our method using a standard validation procedure, and compare its performance against the only other existing multi-level prediction technique. We present results showing that our method substantially outperforms the competitor in several experimental settings, indicating that exploiting the hierarchical nature of the problem can lead to better predictions. In addition, our method is also guaranteed to produce interactions that are consistent with respect to the protein-domain-residue hierarchy.

Dataset Information

Multifaceted protein-protein interaction prediction based on Siamese residual RCNN.

Motivation

Results

Availability and implementation

Supplementary information

Publications

Multifaceted protein-protein interaction prediction based on Siamese residual RCNN.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets