Unknown

Dataset Information

0

A conserved extraordinarily long serine homopolymer in Dictyostelid amoebae.


ABSTRACT: Eukaryotic protein sequences often contain amino-acid homopolymers that consist of a single amino acid repeated from several to dozens of times. Some of these are functional but others may persist largely because of high expansion rates due to DNA slippage. However, very long homopolymers with over a hundred repeats are very rare. We report an extraordinarily long homopolymer consisting of 306 tandem serine repeats from the single-celled eukaryote Dictyostelium discoideum, which also has a multicellular stage. The gene has a paralog with 132 repeats and orthologs, also with high serine repeat numbers, in various other Dictyostelid species. The conserved gene structure and protein sequences suggest that the homopolymer is functional. The high codon diversity and very poor alignment of serine codons in this gene between species similarly indicate functionality. This is because the serine homopolymer is conserved despite much DNA sequence change. A survey of other very long amino-acid homopolymers in eukaryotes shows that high codon diversity is the rule, suggesting that these too may be functional.

SUBMITTER: Tian X 

PROVIDER: S-EPMC3907108 | biostudies-literature | 2014 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

A conserved extraordinarily long serine homopolymer in Dictyostelid amoebae.

Tian X X   Strassmann J E JE   Queller D C DC  

Heredity 20131002 2


Eukaryotic protein sequences often contain amino-acid homopolymers that consist of a single amino acid repeated from several to dozens of times. Some of these are functional but others may persist largely because of high expansion rates due to DNA slippage. However, very long homopolymers with over a hundred repeats are very rare. We report an extraordinarily long homopolymer consisting of 306 tandem serine repeats from the single-celled eukaryote Dictyostelium discoideum, which also has a multi  ...[more]

Similar Datasets

| S-EPMC6303869 | biostudies-literature
| S-EPMC6156593 | biostudies-literature
| S-EPMC3914002 | biostudies-literature
2013-05-20 | GSE46386 | GEO
2013-05-20 | E-GEOD-46386 | biostudies-arrayexpress
| S-EPMC4175584 | biostudies-literature
| S-EPMC3629772 | biostudies-literature
| S-EPMC7919456 | biostudies-literature
| S-EPMC8428833 | biostudies-literature
2018-05-31 | E-MTAB-6000 | biostudies-arrayexpress