Dataset Information

A boosting approach for prediction of protein-RNA binding residues.

ABSTRACT:

Background

RNA binding proteins play important roles in post-transcriptional RNA processing and transcriptional regulation. Distinguishing the RNA-binding residues in proteins is crucial for understanding how protein and RNA recognize each other and function together as a complex.

Results

We propose PredRBR, an effectively computational approach to predict RNA-binding residues. PredRBR is built with gradient tree boosting and an optimal feature set selected from a large number of sequence and structure characteristics and two categories of structural neighborhood properties. In cross-validation experiments on the RBP170 data set show that PredRBR achieves an overall accuracy of 0.84, a sensitivity of 0.85, MCC of 0.55 and AUC of 0.92, which are significantly better than that of other widely used machine learning algorithms such as Support Vector Machine, Random Forest, and Adaboost. We further calculate the feature importance of different feature categories and find that structural neighborhood characteristics are critical in the recognization of RNA binding residues. Also, PredRBR yields significantly better prediction accuracy on an independent test set (RBP101) in comparison with other state-of-the-art methods.

Conclusions

The superior performance over existing RNA-binding residue prediction methods indicates the importance of the gradient tree boosting algorithm combined with the optimal selected features.

SUBMITTER: Tang Y

PROVIDER: S-EPMC5773889 | biostudies-literature | 2017 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A boosting approach for prediction of protein-RNA binding residues.

Tang Yongjun Y Liu Diwei D Wang Zixiang Z Wen Ting T Deng Lei L

BMC bioinformatics 20171201 Suppl 13

<h4>Background</h4>RNA binding proteins play important roles in post-transcriptional RNA processing and transcriptional regulation. Distinguishing the RNA-binding residues in proteins is crucial for understanding how protein and RNA recognize each other and function together as a complex.<h4>Results</h4>We propose PredRBR, an effectively computational approach to predict RNA-binding residues. PredRBR is built with gradient tree boosting and an optimal feature set selected from a large number of ...[more]

PMID: 29219069

Similar Datasets

Project description:A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein-DNA interfaces.

Dataset Information

A boosting approach for prediction of protein-RNA binding residues.

Background

Results

Conclusions

Publications

A boosting approach for prediction of protein-RNA binding residues.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets