Unknown

Dataset Information

0

Towards completion of the Earth's proteome.


ABSTRACT: New protein sequences are deposited in databases at an accelerating pace; however, many of these are homologous to known proteins and could be considered redundant. If all historical releases of the protein database are analysed using the original sequence-clustering procedure described here, the fraction of newly sequenced proteins that are redundant is increasing. We interpret this as an indication that the sequencing of the Earth's proteome--the complete set of proteins on Earth--is approaching completion. We estimate the approximate size of the Earth's proteome to be 5 million sequences, most of which will be identified during the next 5 years. As the Earth's proteome nears completion, cluster analysis of the protein database will become essential to identify under-explored taxa to which future sequencing efforts should be directed and to focus research on protein families without experimental characterization.

SUBMITTER: Perez-Iratxeta C 

PROVIDER: S-EPMC2267224 | biostudies-literature | 2007 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Towards completion of the Earth's proteome.

Perez-Iratxeta Carolina C   Palidwor Gareth G   Andrade-Navarro Miguel A MA  

EMBO reports 20071201 12


New protein sequences are deposited in databases at an accelerating pace; however, many of these are homologous to known proteins and could be considered redundant. If all historical releases of the protein database are analysed using the original sequence-clustering procedure described here, the fraction of newly sequenced proteins that are redundant is increasing. We interpret this as an indication that the sequencing of the Earth's proteome--the complete set of proteins on Earth--is approachi  ...[more]

Similar Datasets

| S-EPMC2395251 | biostudies-literature
| S-EPMC7021095 | biostudies-literature
| S-EPMC4355469 | biostudies-literature
| S-EPMC8169937 | biostudies-literature
| S-EPMC3742771 | biostudies-literature
2012-06-21 | GSE38836 | GEO
| S-EPMC6631695 | biostudies-literature
| S-EPMC5587858 | biostudies-literature
| S-EPMC2614481 | biostudies-literature
| S-EPMC2876126 | biostudies-literature