Unknown

Dataset Information

0

Short sequence motifs, overrepresented in mammalian conserved non-coding sequences.


ABSTRACT:

Background

A substantial fraction of non-coding DNA sequences of multicellular eukaryotes is under selective constraint. In particular, approximately 5% of the human genome consists of conserved non-coding sequences (CNSs). CNSs differ from other genomic sequences in their nucleotide composition and must play important functional roles, which mostly remain obscure.

Results

We investigated relative abundances of short sequence motifs in all human CNSs present in the human/mouse whole-genome alignments vs. three background sets of sequences: (i) weakly conserved or unconserved non-coding sequences (non-CNSs); (ii) near-promoter sequences (located between nucleotides -500 and -1500, relative to a start of transcription); and (iii) random sequences with the same nucleotide composition as that of CNSs. When compared to non-CNSs and near-promoter sequences, CNSs possess an excess of AT-rich motifs, often containing runs of identical nucleotides. In contrast, when compared to random sequences, CNSs contain an excess of GC-rich motifs which, however, lack CpG dinucleotides. Thus, abundance of short sequence motifs in human CNSs, taken as a whole, is mostly determined by their overall compositional properties and not by overrepresentation of any specific short motifs. These properties are: (i) high AT-content of CNSs, (ii) a tendency, probably due to context-dependent mutation, of A's and T's to clump, (iii) presence of short GC-rich regions, and (iv) avoidance of CpG contexts, due to their hypermutability. Only a small number of short motifs, overrepresented in all human CNSs are similar to binding sites of transcription factors from the FOX family.

Conclusion

Human CNSs as a whole appear to be too broad a class of sequences to possess strong footprints of any short sequence-specific functions. Such footprints should be studied at the level of functional subclasses of CNSs, such as those which flank genes with a particular pattern of expression. Overall properties of CNSs are affected by patterns in mutation, suggesting that selection which causes their conservation is not always very strong.

SUBMITTER: Minovitsky S 

PROVIDER: S-EPMC2176071 | biostudies-literature | 2007 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Short sequence motifs, overrepresented in mammalian conserved non-coding sequences.

Minovitsky Simon S   Stegmaier Philip P   Kel Alexander A   Kondrashov Alexey S AS   Dubchak Inna I  

BMC genomics 20071018


<h4>Background</h4>A substantial fraction of non-coding DNA sequences of multicellular eukaryotes is under selective constraint. In particular, approximately 5% of the human genome consists of conserved non-coding sequences (CNSs). CNSs differ from other genomic sequences in their nucleotide composition and must play important functional roles, which mostly remain obscure.<h4>Results</h4>We investigated relative abundances of short sequence motifs in all human CNSs present in the human/mouse who  ...[more]

Similar Datasets

| S-EPMC2646272 | biostudies-literature
| S-EPMC2831315 | biostudies-literature
| S-EPMC3574231 | biostudies-literature
| S-EPMC526512 | biostudies-literature
| S-EPMC1838922 | biostudies-literature
| S-EPMC5577775 | biostudies-literature
| S-EPMC3826502 | biostudies-literature
| S-EPMC3650315 | biostudies-literature
| S-EPMC3105428 | biostudies-literature