Unknown

Dataset Information

0

RCPdb: An evolutionary classification and codon usage database for repeat-containing proteins.


ABSTRACT: Over 3% of human proteins contain single amino acid repeats (repeat-containing proteins, RCPs). Many repeats (homopeptides) localize to important proteins involved in transcription, and the expansion of certain repeats, in particular poly-Q and poly-A tracts, can also lead to the development of neurological diseases. Previous studies have suggested that the homopeptide makeup is a result of the presence of G+C-rich tracts in the encoding genes and that expansion occurs via replication slippage. Here, we have performed a large-scale genomic analysis of the variation of the genes encoding RCPs in 13 species and present these data in an online database (http://repeats.med.monash.edu.au/genetic_analysis/). This resource allows rapid comparison and analysis of RCPs, homopeptides, and their underlying genetic tracts across the eukaryotic species considered. We report three major findings. First, there is a bias for a small subset of codons being reiterated within homopeptides, and there is no G+C or A+T bias relative to the organism's transcriptome. Second, single base pair transversions from the homocodon are unusually common and may represent a mechanism of reducing the rate of homopeptide mutations. Third, homopeptides that are conserved across different species lie within regions that are under stronger purifying selection in contrast to nonconserved homopeptides.

SUBMITTER: Faux NG 

PROVIDER: S-EPMC1899123 | biostudies-literature | 2007 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

RCPdb: An evolutionary classification and codon usage database for repeat-containing proteins.

Faux Noel G NG   Huttley Gavin A GA   Mahmood Khalid K   Webb Geoffrey I GI   de la Banda Maria Garcia MG   Whisstock James C JC  

Genome research 20070613 7


Over 3% of human proteins contain single amino acid repeats (repeat-containing proteins, RCPs). Many repeats (homopeptides) localize to important proteins involved in transcription, and the expansion of certain repeats, in particular poly-Q and poly-A tracts, can also lead to the development of neurological diseases. Previous studies have suggested that the homopeptide makeup is a result of the presence of G+C-rich tracts in the encoding genes and that expansion occurs via replication slippage.  ...[more]

Similar Datasets

| S-EPMC6244663 | biostudies-literature
| S-EPMC8317675 | biostudies-literature
| S-EPMC1540718 | biostudies-literature
| S-EPMC2839124 | biostudies-literature
| S-EPMC8613526 | biostudies-literature
| S-EPMC2799417 | biostudies-literature
| S-EPMC7953982 | biostudies-literature
| S-EPMC8704784 | biostudies-literature
| S-EPMC6934141 | biostudies-literature
| S-EPMC3504468 | biostudies-literature