Unknown

Dataset Information

0

Rapid identification of sequences for orphan enzymes to power accurate protein annotation.


ABSTRACT: The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.

SUBMITTER: Ramkissoon KR 

PROVIDER: S-EPMC3875567 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

altmetric image

Publications

Rapid identification of sequences for orphan enzymes to power accurate protein annotation.

Ramkissoon Kevin R KR   Miller Jennifer K JK   Ojha Sunil S   Watson Douglas S DS   Bomar Martha G MG   Galande Amit K AK   Shearer Alexander G AG  

PloS one 20131230 12


The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly e  ...[more]

Similar Datasets

| S-EPMC8445202 | biostudies-literature
| S-EPMC4020792 | biostudies-literature
| S-EPMC3377989 | biostudies-literature
2019-07-03 | GSE125218 | GEO
| S-EPMC4084501 | biostudies-literature
| S-EPMC2732367 | biostudies-literature
| S-EPMC3125804 | biostudies-literature
| S-EPMC8557692 | biostudies-literature
| S-EPMC6528300 | biostudies-literature
| S-EPMC2957682 | biostudies-literature