Unknown

Dataset Information

0

Computational identification of protein coding potential of conserved sequence tags through cross-species evolutionary analysis.


ABSTRACT: The identification of conserved sequence tags (CSTs) through comparative genome analysis may reveal important regulatory elements involved in shaping the spatio-temporal expression of genetic information. It is well known that the most significant fraction of CSTs observed in human-mouse comparisons correspond to protein coding exons, due to their strong evolutionary constraints. As we still do not know the complete gene inventory of the human and mouse genomes it is of the utmost importance to establish if detected conserved sequences are genes or not. We propose here a simple algorithm that, based on the observation of the specific evolutionary dynamics of coding sequences, efficiently discriminates between coding and non-coding CSTs. The application of this method may help the validation of predicted genes, the prediction of alternative splicing patterns in known and unknown genes and the definition of a dictionary of non-coding regulatory elements.

SUBMITTER: Mignone F 

PROVIDER: S-EPMC169873 | biostudies-literature | 2003 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Computational identification of protein coding potential of conserved sequence tags through cross-species evolutionary analysis.

Mignone Flavio F   Grillo Giorgio G   Liuni Sabino S   Pesole Graziano G  

Nucleic acids research 20030801 15


The identification of conserved sequence tags (CSTs) through comparative genome analysis may reveal important regulatory elements involved in shaping the spatio-temporal expression of genetic information. It is well known that the most significant fraction of CSTs observed in human-mouse comparisons correspond to protein coding exons, due to their strong evolutionary constraints. As we still do not know the complete gene inventory of the human and mouse genomes it is of the utmost importance to  ...[more]

Similar Datasets

| S-EPMC441624 | biostudies-literature
| S-EPMC5381583 | biostudies-literature
| S-EPMC4203583 | biostudies-literature
| S-EPMC4608865 | biostudies-literature
| S-EPMC3152334 | biostudies-literature
| S-EPMC3855098 | biostudies-literature
| S-EPMC6099560 | biostudies-literature
| S-EPMC5054452 | biostudies-literature
| S-EPMC310835 | biostudies-literature
| S-EPMC3087714 | biostudies-literature