Dataset Information

Automatically extracting functionally equivalent proteins from SwissProt.

ABSTRACT:

Background

There is a frequent need to obtain sets of functionally equivalent homologous proteins (FEPs) from different species. While it is usually the case that orthology implies functional equivalence, this is not always true; therefore datasets of orthologous proteins are not appropriate. The information relevant to extracting FEPs is contained in databanks such as UniProtKB/Swiss-Prot and a manual analysis of these data allow FEPs to be extracted on a one-off basis. However there has been no resource allowing the easy, automatic extraction of groups of FEPs - for example, all instances of protein C.We have developed FOSTA, an automatically generated database of FEPs annotated as having the same function in UniProtKB/Swiss-Prot which can be used for large-scale analysis. The method builds a candidate list of homologues and filters out functionally diverged proteins on the basis of functional annotations using a simple text mining approach.

Results

Large scale evaluation of our FEP extraction method is difficult as there is no gold-standard dataset against which the method can be benchmarked. However, a manual analysis of five protein families confirmed a high level of performance. A more extensive comparison with two manually verified functional equivalence datasets also demonstrated very good performance.

Conclusion

In summary, FOSTA provides an automated analysis of annotations in UniProtKB/Swiss-Prot to enable groups of proteins already annotated as functionally equivalent, to be extracted. Our results demonstrate that the vast majority of UniProtKB/Swiss-Prot functional annotations are of high quality, and that FOSTA can interpret annotations successfully. Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/Swiss-Prot annotation. Most of these would have presented equal difficulties for manual interpretation of annotations. We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/Swiss-Prot format, which would facilitate text-mining of UniProtKB/Swiss-Prot.

SUBMITTER: McMillan LE

PROVIDER: S-EPMC2576269 | biostudies-literature | 2008 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Automatically extracting functionally equivalent proteins from SwissProt.

McMillan Lisa E M LE Martin Andrew C R AC

BMC bioinformatics 20081006

<h4>Background</h4>There is a frequent need to obtain sets of functionally equivalent homologous proteins (FEPs) from different species. While it is usually the case that orthology implies functional equivalence, this is not always true; therefore datasets of orthologous proteins are not appropriate. The information relevant to extracting FEPs is contained in databanks such as UniProtKB/Swiss-Prot and a manual analysis of these data allow FEPs to be extracted on a one-off basis. However there ha ...[more]

PMID: 18838004

Similar Datasets

Project description:Mycobacterium marinum is a nontuberculous pathogen of poikilothermic fish and an opportunistic human pathogen. Like tuberculous mycobacteria, the M. marinum M strain requires the ESX-1 (ESAT-6 system 1) secretion system for virulence in host cells. EsxB and EsxA, two major virulence factors exported by the ESX-1 system, are encoded by the esxBA genes within the ESX-1 locus. Deletion of the esxBA genes abrogates ESX-1 export and attenuates M. marinum in ex vivo and in vivo models of infection. Interestingly, there are several duplications of the esxB and esxA genes (esxB_1, esxB_2, esxA_1, esxA_2, and esxA_3) in the M. marinum M genome located outside the ESX-1 locus. We sought to understand if this region, known as ESX-6, contributes to ESX-1-mediated virulence. We found that deletion of the esxB_1 gene alone or the entire ESX-6 locus did not impact ESX-1 export or function, supporting the idea that the esxBA genes present at the ESX-1 locus are the primary contributors to ESX-1-mediated virulence. Nevertheless, overexpression of the esxB_1 locus complemented ESX-1 function in the ΔesxBA strain, signifying that the two loci are functionally equivalent. Our findings raise questions about why duplicate versions of the esxBA genes are maintained in the M. marinum M genome and how these proteins, which are functionally equivalent to virulence factors, contribute to mycobacterial biology.IMPORTANCEMycobacterium tuberculosis is the causative agent of the human disease tuberculosis (TB). There are 10.4 million cases and 1.7 million TB-associated deaths annually, making TB a leading cause of death globally. Nontuberculous mycobacteria (NTM) cause chronic human infections that are acquired from the environment. Despite differences in disease etiology, both tuberculous and NTM pathogens use the ESX-1 secretion system to cause disease. The nontubercular mycobacterial species, Mycobacterium marinum, has additional copies of specific ESX-1 genes. Our findings demonstrate that the duplicated genes do not contribute to virulence but can substitute for virulence factors in M. marinum These findings suggest that the duplicated genes may play a specific role in NTM biology.

Dataset Information

Automatically extracting functionally equivalent proteins from SwissProt.

Background

Results

Conclusion

Publications

Automatically extracting functionally equivalent proteins from SwissProt.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets