Unknown

Dataset Information

0

Arabidopsis intragenomic conserved noncoding sequence.


ABSTRACT: After the most recent tetraploidy in the Arabidopsis lineage, most gene pairs lost one, but not both, of their duplicates. We manually inspected the 3,179 retained gene pairs and their surrounding gene space still present in the genome using a custom-made viewer application. The display of these pairs allowed us to define intragenic conserved noncoding sequences (CNSs), identify exon annotation errors, and discover potentially new genes. Using a strict algorithm to sort high-scoring pair sequences from the bl2seq data, we created a database of 14,944 intragenomic Arabidopsis CNSs. The mean CNS length is 31 bp, ranging from 15 to 285 bp. There are approximately 1.7 CNSs associated with a typical gene, and Arabidopsis CNSs are found in all areas around exons, most frequently in the 5' upstream region. Gene ontology classifications related to transcription, regulation, or "response to ..." external or endogenous stimuli, especially hormones, tend to be significantly overrepresented among genes containing a large number of CNSs, whereas protein localization, transport, and metabolism are common among genes with no CNSs. There is a 1.5% overlap between these CNSs and the 218,982 putative RNAs in the Arabidopsis Small RNA Project database, allowing for two mismatches. These CNSs provide a unique set of noncoding sequences enriched for function. CNS function is implied by evolutionary conservation and independently supported because CNS-richness predicts regulatory gene ontology categories.

SUBMITTER: Thomas BC 

PROVIDER: S-EPMC1805546 | biostudies-literature | 2007 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Arabidopsis intragenomic conserved noncoding sequence.

Thomas Brian C BC   Rapaka Lakshmi L   Lyons Eric E   Pedersen Brent B   Freeling Michael M  

Proceedings of the National Academy of Sciences of the United States of America 20070214 9


After the most recent tetraploidy in the Arabidopsis lineage, most gene pairs lost one, but not both, of their duplicates. We manually inspected the 3,179 retained gene pairs and their surrounding gene space still present in the genome using a custom-made viewer application. The display of these pairs allowed us to define intragenic conserved noncoding sequences (CNSs), identify exon annotation errors, and discover potentially new genes. Using a strict algorithm to sort high-scoring pair sequenc  ...[more]

Similar Datasets

| S-EPMC8233505 | biostudies-literature
| S-EPMC3270375 | biostudies-literature
| S-EPMC3879966 | biostudies-literature
2021-01-04 | GSE164159 | GEO
| S-EPMC2000400 | biostudies-literature
| S-EPMC8256870 | biostudies-literature
| S-EPMC3971591 | biostudies-literature
| S-EPMC2790875 | biostudies-literature
| S-EPMC403677 | biostudies-literature
2010-05-18 | E-GEOD-17135 | biostudies-arrayexpress