Unknown

Dataset Information

0

Introns Structure Patterns of Variation in Nucleotide Composition in Arabidopsis thaliana and Rice Protein-Coding Genes.


ABSTRACT: Plant genomes present a continuous range of variation in nucleotide composition (G + C content). In coding regions, G + C-poor species tend to have unimodal distributions of G + C content among genes within genomes and slight 5'-3' gradients along genes. In contrast, G + C-rich species display bimodal distributions of G + C content among genes and steep 5'-3' decreasing gradients along genes. The causes of these peculiar patterns are still poorly understood. Within two species (Arabidopsis thaliana and rice), each representative of one side of the continuum, we studied the consequences of intron presence on coding region and intron G + C content at different scales. By properly taking intron structure into account, we showed that, in both species, intron presence is associated with step changes in nucleotide, codon, and amino acid composition. This suggests that introns have a barrier effect structuring G + C content along genes and that previous continuous characterizations of the 5'-3' gradients were artifactual. In external gene regions (located upstream first or downstream last introns), species-specific factors, such as GC-biased gene conversion, are shaping G + C content whereas in internal gene regions (surrounded by introns), G + C content is likely constrained to remain within a range common to both species.

SUBMITTER: Ressayre A 

PROVIDER: S-EPMC4684703 | biostudies-other | 2015 Oct

REPOSITORIES: biostudies-other

altmetric image

Publications

Introns Structure Patterns of Variation in Nucleotide Composition in Arabidopsis thaliana and Rice Protein-Coding Genes.

Ressayre Adrienne A   Glémin Sylvain S   Montalent Pierre P   Serre-Giardi Laurana L   Dillmann Christine C   Joets Johann J  

Genome biology and evolution 20151007 10


Plant genomes present a continuous range of variation in nucleotide composition (G + C content). In coding regions, G + C-poor species tend to have unimodal distributions of G + C content among genes within genomes and slight 5'-3' gradients along genes. In contrast, G + C-rich species display bimodal distributions of G + C content among genes and steep 5'-3' decreasing gradients along genes. The causes of these peculiar patterns are still poorly understood. Within two species (Arabidopsis thali  ...[more]

Similar Datasets

| S-EPMC1461127 | biostudies-other
| S-EPMC1431726 | biostudies-literature
| S-EPMC7747988 | biostudies-literature
| S-EPMC4906602 | biostudies-other
| S-EPMC4396512 | biostudies-literature
| S-EPMC2551617 | biostudies-literature
| S-EPMC1892575 | biostudies-literature
| S-EPMC3608163 | biostudies-literature
| S-EPMC3601123 | biostudies-literature
| S-EPMC4128822 | biostudies-literature