Dataset Information

Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb.

ABSTRACT:

Background

A protein annotation database, such as the Universal Protein Resource knowledge base (UniProtKb), is a valuable resource for the validation and interpretation of predicted 3D structure patterns in proteins. Existing studies have focussed on point mutation extraction methods from biomedical literature which can be used to support the time consuming work of manual database curation. However, these methods were limited to point mutation extraction and do not extract features for the annotation of proteins at the residue level.

Results

This work introduces a system that identifies protein residues in MEDLINE abstracts and annotates them with features extracted from the context written in the surrounding text. MEDLINE abstract texts have been processed to identify protein mentions in combination with taxonomic species and protein residues (F1-measure 0.52). The identified protein-species-residue triplets have been validated and benchmarked against reference data resources (UniProtKb, average F1-measure of 0.54). Then, contextual features were extracted through shallow and deep parsing and the features have been classified into predefined categories (F1-measure ranges from 0.15 to 0.67). Furthermore, the feature sets have been aligned with annotation types in UniProtKb to assess the relevance of the annotations for ongoing curation projects. Altogether, the annotations have been assessed automatically and manually against reference data resources.

Conclusion

This work proposes a solution for the automatic extraction of functional annotation for protein residues from biomedical articles. The presented approach is an extension to other existing systems in that a wider range of residue entities are considered and that features of residues are extracted as annotations.

SUBMITTER: Nagel K

PROVIDER: S-EPMC2745586 | biostudies-literature | 2009 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb.

Nagel Kevin K Jimeno-Yepes Antonio A Rebholz-Schuhmann Dietrich D

BMC bioinformatics 20090827

<h4>Background</h4>A protein annotation database, such as the Universal Protein Resource knowledge base (UniProtKb), is a valuable resource for the validation and interpretation of predicted 3D structure patterns in proteins. Existing studies have focussed on point mutation extraction methods from biomedical literature which can be used to support the time consuming work of manual database curation. However, these methods were limited to point mutation extraction and do not extract features for ...[more]

PMID: 19758468

Dataset Information

Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb.

Background

Results

Conclusion

Publications

Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Enzyme annotation in UniProtKB using Rhea.
| S-EPMC7162351 | biostudies-literature

Application of text-mining for updating protein post-translational modification annotation in UniProtKB.
| S-EPMC3660268 | biostudies-literature

Annotation of biologically relevant ligands in UniProtKB using ChEBI.
| S-EPMC9825770 | biostudies-literature

FragKB: structural and literature annotation resource of conserved peptide fragments and residues.
| S-EPMC2841175 | biostudies-literature

The UniProtKB/Swiss-Prot knowledgebase and its Plant Proteome Annotation Program.
| S-EPMC2689360 | biostudies-literature

Challenges in the annotation of pseudoenzymes in databases: the UniProtKB approach.
| S-EPMC7160037 | biostudies-literature

Collaborative annotation of genes and proteins between UniProtKB/Swiss-Prot and dictyBase.
| S-EPMC2790310 | biostudies-literature

A semantic-based workflow for biomedical literature annotation.
| S-EPMC5691355 | biostudies-literature

Partially-supervised protein subclass discovery with simultaneous annotation of functional residues.
| S-EPMC2777906 | biostudies-literature

Automatic consistency assurance for literature-based gene ontology annotation.
| S-EPMC8620237 | biostudies-literature