Unknown

Dataset Information

0

Nature of the protein universe.


ABSTRACT: The protein universe is the set of all proteins of all organisms. Here, all currently known sequences are analyzed in terms of families that have single-domain or multidomain architectures and whether they have a known three-dimensional structure. Growth of new single-domain families is very slow: Almost all growth comes from new multidomain architectures that are combinations of domains characterized by approximately 15,000 sequence profiles. Single-domain families are mostly shared by the major groups of organisms, whereas multidomain architectures are specific and account for species diversity. There are known structures for a quarter of the single-domain families, and >70% of all sequences can be partially modeled thanks to their membership in these families.

SUBMITTER: Levitt M 

PROVIDER: S-EPMC2698892 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC3118404 | biostudies-literature
| S-EPMC4136566 | biostudies-literature
| S-EPMC4845728 | biostudies-literature
| S-EPMC403752 | biostudies-literature
| S-EPMC2744874 | biostudies-literature
| S-EPMC5127300 | biostudies-literature
| S-EPMC6481876 | biostudies-literature
| S-EPMC2973819 | biostudies-literature
| S-EPMC3557196 | biostudies-literature
| S-EPMC10133388 | biostudies-literature