Unknown

Dataset Information

0

On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences.


ABSTRACT: All known terrestrial proteins are coded as continuous strings of ?20 amino acids. The patterns formed by the repetitions of elements in groups of finite sequences describes the natural architectures of protein families. We present a method to search for patterns and groupings of patterns in protein sequences using a mathematically precise definition for "repetition", an efficient algorithmic implementation and a robust scoring system with no adjustable parameters. We show that the sequence patterns can be well-separated into disjoint classes according to their recurrence in nested structures. The statistics of the occurrences of patterns indicate that short repetitions are sufficient to account for the differences between natural families and randomized groups of sequences by more than 10 standard deviations, while contiguous sequence patterns shorter than 5 residues are effectively random in their occurrences. A small subset of patterns is sufficient to account for a robust "familiarity" definition between arbitrary sets of sequences.

SUBMITTER: Turjanski P 

PROVIDER: S-EPMC7184844 | biostudies-literature | 2018 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences.

Turjanski Pablo P   Ferreiro Diego U DU  

The journal of physical chemistry. B 20181008 49


All known terrestrial proteins are coded as continuous strings of ≈20 amino acids. The patterns formed by the repetitions of elements in groups of finite sequences describes the natural architectures of protein families. We present a method to search for patterns and groupings of patterns in protein sequences using a mathematically precise definition for "repetition", an efficient algorithmic implementation and a robust scoring system with no adjustable parameters. We show that the sequence patt  ...[more]

Similar Datasets

| S-EPMC4047675 | biostudies-literature
| S-EPMC2144659 | biostudies-other
| S-EPMC5375668 | biostudies-literature
| S-EPMC3001449 | biostudies-literature
| S-EPMC3402919 | biostudies-literature
| S-EPMC11232051 | biostudies-literature
| S-EPMC4073670 | biostudies-literature
| S-EPMC8050329 | biostudies-literature
| S-EPMC5145171 | biostudies-literature
| S-EPMC6555512 | biostudies-literature