Unknown

Dataset Information

0

Automatically extracting functionally equivalent proteins from SwissProt.


ABSTRACT: There is a frequent need to obtain sets of functionally equivalent homologous proteins (FEPs) from different species. While it is usually the case that orthology implies functional equivalence, this is not always true; therefore datasets of orthologous proteins are not appropriate. The information relevant to extracting FEPs is contained in databanks such as UniProtKB/Swiss-Prot and a manual analysis of these data allow FEPs to be extracted on a one-off basis. However there has been no resource allowing the easy, automatic extraction of groups of FEPs - for example, all instances of protein C.We have developed FOSTA, an automatically generated database of FEPs annotated as having the same function in UniProtKB/Swiss-Prot which can be used for large-scale analysis. The method builds a candidate list of homologues and filters out functionally diverged proteins on the basis of functional annotations using a simple text mining approach.Large scale evaluation of our FEP extraction method is difficult as there is no gold-standard dataset against which the method can be benchmarked. However, a manual analysis of five protein families confirmed a high level of performance. A more extensive comparison with two manually verified functional equivalence datasets also demonstrated very good performance.In summary, FOSTA provides an automated analysis of annotations in UniProtKB/Swiss-Prot to enable groups of proteins already annotated as functionally equivalent, to be extracted. Our results demonstrate that the vast majority of UniProtKB/Swiss-Prot functional annotations are of high quality, and that FOSTA can interpret annotations successfully. Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/Swiss-Prot annotation. Most of these would have presented equal difficulties for manual interpretation of annotations. We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/Swiss-Prot format, which would facilitate text-mining of UniProtKB/Swiss-Prot.

SUBMITTER: McMillan LE 

PROVIDER: S-EPMC2576269 | biostudies-literature | 2008 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Automatically extracting functionally equivalent proteins from SwissProt.

McMillan Lisa E M LE   Martin Andrew C R AC  

BMC bioinformatics 20081006


<h4>Background</h4>There is a frequent need to obtain sets of functionally equivalent homologous proteins (FEPs) from different species. While it is usually the case that orthology implies functional equivalence, this is not always true; therefore datasets of orthologous proteins are not appropriate. The information relevant to extracting FEPs is contained in databanks such as UniProtKB/Swiss-Prot and a manual analysis of these data allow FEPs to be extracted on a one-off basis. However there ha  ...[more]

Similar Datasets

| S-EPMC7545930 | biostudies-literature
| S-EPMC3035632 | biostudies-literature
| S-EPMC1874638 | biostudies-literature
| S-EPMC2923139 | biostudies-literature
| S-EPMC8789116 | biostudies-literature
| S-EPMC3534038 | biostudies-literature
2024-05-29 | GSE208708 | GEO