Unknown

Dataset Information

0

Hundreds of putatively functional small open reading frames in Drosophila.


ABSTRACT:

Background

The relationship between DNA sequence and encoded information is still an unsolved puzzle. The number of protein-coding genes in higher eukaryotes identified by genome projects is lower than was expected, while a considerable amount of putatively non-coding transcription has been detected. Functional small open reading frames (smORFs) are known to exist in several organisms. However, coding sequence detection methods are biased against detecting such very short open reading frames. Thus, a substantial number of non-canonical coding regions encoding short peptides might await characterization.

Results

Using bio-informatics methods, we have searched for smORFs of less than 100 amino acids in the putatively non-coding euchromatic DNA of Drosophila melanogaster, and initially identified nearly 600,000 of them. We have studied the pattern of conservation of these smORFs as coding entities between D. melanogaster and Drosophila pseudoobscura, their presence in syntenic and in transcribed regions of the genome, and their ratio of conservative versus non-conservative nucleotide changes. For negative controls, we compared the results with those obtained using random short sequences, while a positive control was provided by smORFs validated by proteomics data.

Conclusions

The combination of these analyses led us to postulate the existence of at least 401 functional smORFs in Drosophila, with the possibility that as many as 4,561 such functional smORFs may exist.

SUBMITTER: Ladoukakis E 

PROVIDER: S-EPMC3334604 | biostudies-literature | 2011 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Hundreds of putatively functional small open reading frames in Drosophila.

Ladoukakis Emmanuel E   Pereira Vini V   Magny Emile G EG   Eyre-Walker Adam A   Couso Juan Pablo JP  

Genome biology 20111125 11


<h4>Background</h4>The relationship between DNA sequence and encoded information is still an unsolved puzzle. The number of protein-coding genes in higher eukaryotes identified by genome projects is lower than was expected, while a considerable amount of putatively non-coding transcription has been detected. Functional small open reading frames (smORFs) are known to exist in several organisms. However, coding sequence detection methods are biased against detecting such very short open reading fr  ...[more]

Similar Datasets

| S-EPMC10983949 | biostudies-literature
2020-03-14 | GSE131650 | GEO
2021-04-28 | GSE154491 | GEO
2019-07-03 | GSE125218 | GEO
| S-EPMC7289059 | biostudies-literature
| S-EPMC2813248 | biostudies-literature
2014-09-11 | E-GEOD-60384 | biostudies-arrayexpress
| S-EPMC7085969 | biostudies-literature
| S-EPMC10152738 | biostudies-literature
2014-09-11 | GSE60384 | GEO