Unknown

Dataset Information

0

Large-scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe.


ABSTRACT: Intrinsic disorder (ID) in proteins has been extensively described for the last decade; a large-scale classification of ID in proteins is mostly missing. Here, we provide an extensive analysis of ID in the protein universe on the UniProt database derived from sequence-based predictions in MobiDB. Almost half the sequences contain an ID region of at least five residues. About 9% of proteins have a long ID region of over 20 residues which are more abundant in Eukaryotic organisms and most frequently cover less than 20% of the sequence. A small subset of about 67,000 (out of over 80 million) proteins is fully disordered and mostly found in Viruses. Most proteins have only one ID, with short ID evenly distributed along the sequence and long ID overrepresented in the center. The charged residue composition of Das and Pappu was used to classify ID proteins by structural propensities and corresponding functional enrichment. Swollen Coils seem to be used mainly as structural components and in biosynthesis in both Prokaryotes and Eukaryotes. In Bacteria, they are confined in the nucleoid and in Viruses provide DNA binding function. Coils & Hairpins seem to be specialized in ribosome binding and methylation activities. Globules & Tadpoles bind antigens in Eukaryotes but are involved in killing other organisms and cytolysis in Bacteria. The Undefined class is used by Bacteria to bind toxic substances and mediate transport and movement between and within organisms in Viruses. Fully disordered proteins behave similarly, but are enriched for glycine residues and extracellular structures.

SUBMITTER: Necci M 

PROVIDER: S-EPMC5119570 | biostudies-literature | 2016 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Large-scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe.

Necci Marco M   Piovesan Damiano D   Tosatto Silvio C E SC  

Protein science : a publication of the Protein Society 20161025 12


Intrinsic disorder (ID) in proteins has been extensively described for the last decade; a large-scale classification of ID in proteins is mostly missing. Here, we provide an extensive analysis of ID in the protein universe on the UniProt database derived from sequence-based predictions in MobiDB. Almost half the sequences contain an ID region of at least five residues. About 9% of proteins have a long ID region of over 20 residues which are more abundant in Eukaryotic organisms and most frequent  ...[more]

Similar Datasets

| S-EPMC5737077 | biostudies-literature
| S-EPMC2533134 | biostudies-literature
| S-EPMC240660 | biostudies-literature
| S-EPMC3557196 | biostudies-literature
| S-EPMC10133388 | biostudies-literature
| S-EPMC8295265 | biostudies-literature
| S-EPMC7862400 | biostudies-literature
| S-EPMC4682632 | biostudies-literature
| S-EPMC10762911 | biostudies-literature
| S-EPMC3464192 | biostudies-literature