Dataset Information

The scale-free nature of protein sequence space.

ABSTRACT: The sequence space of five protein superfamilies was investigated by constructing sequence networks. The nodes represent individual sequences, and two nodes are connected by an edge if the global sequence identity of two sequences exceeds a threshold. The networks were characterized by their degree distribution (number of nodes with a given number of neighbors) and by their fractal network dimension. Although the five protein families differed in sequence length, fold, and domain arrangement, their network properties were similar. The fractal network dimension Df was distance-dependent: a high dimension for single and double mutants (Df = 4.0), which dropped to Df = 0.7-1.0 at 90% sequence identity, and increased to Df = 3.5-4.5 below 70% sequence identity. The distance dependency of the network dimension is consistent with evolutionary constraints for functional proteins. While random single and double mutations often result in a functional protein, the accumulation of more than ten mutations is dominated by epistasis. The networks of the five protein families were highly inhomogeneous with few highly connected communities ("hub sequences") and a large number of smaller and less connected communities. The degree distributions followed a power-law distribution with similar scaling exponents close to 1. Because the hub sequences have a large number of functional neighbors, they are expected to be robust toward possible deleterious effects of mutations. Because of their robustness, hub sequences have the potential of high innovability, with additional mutations readily inducing new functions. Therefore, they form hotspots of evolution and are promising candidates as starting points for directed evolution experiments in biotechnology.

SUBMITTER: Buchholz PCF

PROVIDER: S-EPMC6070207 | biostudies-literature | 2018

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

The scale-free nature of protein sequence space.

Buchholz Patrick C F PCF Zeil Catharina C Pleiss Jürgen J

PloS one 20180801 8

The sequence space of five protein superfamilies was investigated by constructing sequence networks. The nodes represent individual sequences, and two nodes are connected by an edge if the global sequence identity of two sequences exceeds a threshold. The networks were characterized by their degree distribution (number of nodes with a given number of neighbors) and by their fractal network dimension. Although the five protein families differed in sequence length, fold, and domain arrangement, th ...[more]

PMID: 30067815

Dataset Information

The scale-free nature of protein sequence space.

Publications

The scale-free nature of protein sequence space.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Thoroughly sampling sequence space: large-scale protein design of structural ensembles.
| S-EPMC2373757 | biostudies-literature

Free-space-coupled wavelength-scale disk resonators.
| S-EPMC11501722 | biostudies-literature

Percolation in protein sequence space.
| S-EPMC5738032 | biostudies-literature

PairsDB atlas of protein sequence space.
| S-EPMC2238971 | biostudies-literature

Alignment-free viral sequence classification at scale.
| S-EPMC12007369 | biostudies-literature

Large-scale mapping of bioactive peptides in structural and sequence space.
| S-EPMC5774755 | biostudies-literature

Protein families and TRIBES in genome sequence space.
| S-EPMC169885 | biostudies-literature

Large-scale network analysis reveals the sequence space architecture of antibody repertoires.
| S-EPMC6428871 | biostudies-literature

Toward a genome scale sequence specific dynamic model of cell-free protein synthesis in Escherichia coli.
| S-EPMC7136494 | biostudies-literature

Engineering orthogonal signaling pathways reveals the sparse distribtion of protein protein interactions in sequence space
2019-08-05 | GSE120789 | GEO