Dataset Information

Functional Site Discovery From Incomplete Training Data: A Case Study With Nucleic Acid-Binding Proteins.

ABSTRACT: Function annotation efforts provide a foundation to our understanding of cellular processes and the functioning of the living cell. This motivates high-throughput computational methods to characterize new protein members of a particular function. Research work has focused on discriminative machine-learning methods, which promise to make efficient, de novo predictions of protein function. Furthermore, available function annotation exists predominantly for individual proteins rather than residues of which only a subset is necessary for the conveyance of a particular function. This limits discriminative approaches to predicting functions for which there is sufficient residue-level annotation, e.g., identification of DNA-binding proteins or where an excellent global representation can be divined. Complete understanding of the various functions of proteins requires discovery and functional annotation at the residue level. Herein, we cast this problem into the setting of multiple-instance learning, which only requires knowledge of the protein's function yet identifies functionally relevant residues and need not rely on homology. We developed a new multiple-instance leaning algorithm derived from AdaBoost and benchmarked this algorithm against two well-studied protein function prediction tasks: annotating proteins that bind DNA and RNA. This algorithm outperforms certain previous approaches in annotating protein function while identifying functionally relevant residues involved in binding both DNA and RNA, and on one protein-DNA benchmark, it achieves near perfect classification.

SUBMITTER: Wang W

PROVIDER: S-EPMC6729729 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Functional Site Discovery From Incomplete Training Data: A Case Study With Nucleic Acid-Binding Proteins.

Wang Wenchuan W Langlois Robert R Langlois Marina M Genchev Georgi Z GZ Wang Xiaolei X Lu Hui H

Frontiers in genetics 20190830

Function annotation efforts provide a foundation to our understanding of cellular processes and the functioning of the living cell. This motivates high-throughput computational methods to characterize new protein members of a particular function. Research work has focused on discriminative machine-learning methods, which promise to make efficient, <i>de novo</i> predictions of protein function. Furthermore, available function annotation exists predominantly for individual proteins rather than re ...[more]

PMID: 31543893

Dataset Information

Functional Site Discovery From Incomplete Training Data: A Case Study With Nucleic Acid-Binding Proteins.

Publications

Functional Site Discovery From Incomplete Training Data: A Case Study With Nucleic Acid-Binding Proteins.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models.
| S-EPMC4245949 | biostudies-literature

A Discovery Funnel for Nucleic Acid Binding Drug Candidates.
| S-EPMC3090163 | biostudies-literature

Amino Acid Composition in Various Types of Nucleic Acid-Binding Proteins.
| S-EPMC7831508 | biostudies-literature

Leukocyte protease binding to nucleic acids promotes nuclear localization and cleavage of nucleic acid binding proteins.
| S-EPMC4041364 | biostudies-literature

Predicting nucleic acid binding interfaces from structural models of proteins.
| S-EPMC3290761 | biostudies-literature

Discovery of Nucleic Acid Binding Molecules from Combinatorial Biohybrid Nucleobase Peptide Libraries.
| S-EPMC7958298 | biostudies-literature

ProbeRating: a recommender system to infer binding profiles for nucleic acid-binding proteins.
| S-EPMC7750938 | biostudies-literature

Structure-based deep learning for binding site detection in nucleic acid macromolecules.
| S-EPMC8633674 | biostudies-literature

Structural transformation induced by locked nucleic acid or 2'-O-methyl nucleic acid site-specific modifications on thrombin binding aptamer.
| S-EPMC4000052 | biostudies-literature

Affinity regression predicts the recognition code of nucleic acid-binding proteins.
| S-EPMC4871164 | biostudies-literature