Unknown

Dataset Information

0

Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct.


ABSTRACT: Most computational methods that predict protein function do not take advantage of the large amount of information contained in the biomedical literature. In this work we evaluate both ontology term co-mention and bag-of-words features mined from the biomedical literature and analyze their impact in the context of a structured output support vector machine model, GOstruct. We find that even simple literature based features are useful for predicting human protein function (F-max: Molecular Function =0.408, Biological Process =0.461, Cellular Component =0.608). One advantage of using literature features is their ability to offer easy verification of automated predictions. We find through manual inspection of misclassifications that some false positive predictions could be biologically valid predictions based upon support extracted from the literature. Additionally, we present a "medium-throughput" pipeline that was used to annotate a large subset of co-mentions; we suggest that this strategy could help to speed up the rate at which proteins are curated.

SUBMITTER: Funk CS 

PROVIDER: S-EPMC4441003 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

altmetric image

Publications

Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct.

Funk Christopher S CS   Kahanda Indika I   Ben-Hur Asa A   Verspoor Karin M KM  

Journal of biomedical semantics 20150318


Most computational methods that predict protein function do not take advantage of the large amount of information contained in the biomedical literature. In this work we evaluate both ontology term co-mention and bag-of-words features mined from the biomedical literature and analyze their impact in the context of a structured output support vector machine model, GOstruct. We find that even simple literature based features are useful for predicting human protein function (F-max: Molecular Functio  ...[more]

Similar Datasets

| S-EPMC3584852 | biostudies-literature
| S-EPMC8573063 | biostudies-literature
| S-EPMC2500093 | biostudies-literature
| S-EPMC1395344 | biostudies-literature
| S-EPMC6501925 | biostudies-literature
| S-EPMC6794279 | biostudies-literature
| S-EPMC1560402 | biostudies-literature
| S-EPMC3102890 | biostudies-other
| S-EPMC6649004 | biostudies-literature
| S-EPMC2767227 | biostudies-literature