Unknown

Dataset Information

0

Mining mammalian transcript data for functional long non-coding RNAs.


ABSTRACT: BACKGROUND: The role of long non-coding RNAs (lncRNAs) in controlling gene expression has garnered increased interest in recent years. Sequencing projects, such as Fantom3 for mouse and H-InvDB for human, have generated abundant data on transcribed components of mammalian cells, the majority of which appear not to be protein-coding. However, much of the non-protein-coding transcriptome could merely be a consequence of 'transcription noise'. It is therefore essential to use bioinformatic approaches to identify the likely functional candidates in a high throughput manner. PRINCIPAL FINDINGS: We derived a scheme for classifying and annotating likely functional lncRNAs in mammals. Using the available experimental full-length cDNA data sets for human and mouse, we identified 78 lncRNAs that are either syntenically conserved between human and mouse, or that originate from the same protein-coding genes. Of these, 11 have significant sequence homology. We found that these lncRNAs exhibit: (i) patterns of codon substitution typical of non-coding transcripts; (ii) preservation of sequences in distant mammals such as dog and cow, (iii) significant sequence conservation relative to their corresponding flanking regions (in 50% cases, flanking regions do not have homology at all; and in the remaining, the degree of conservation is significantly less); (iv) existence mostly as single-exon forms (8/11); and, (v) presence of conserved and stable secondary structure motifs within them. We further identified orthologous protein-coding genes that are contributing to the pool of lncRNAs; of which, genes implicated in carcinogenesis are significantly over-represented. CONCLUSION: Our comparative mammalian genomics approach coupled with evolutionary analysis identified a small population of conserved long non-protein-coding RNAs (lncRNAs) that are potentially functional across Mammalia. Additionally, our analysis indicates that amongst the orthologous protein-coding genes that produce lncRNAs, those implicated in cancer pathogenesis are significantly over-represented, suggesting that these lncRNAs could play an important role in cancer pathomechanisms.

SUBMITTER: Khachane AN 

PROVIDER: S-EPMC2859052 | biostudies-literature | 2010

REPOSITORIES: biostudies-literature

altmetric image

Publications

Mining mammalian transcript data for functional long non-coding RNAs.

Khachane Amit N AN   Harrison Paul M PM  

PloS one 20100423 4


<h4>Background</h4>The role of long non-coding RNAs (lncRNAs) in controlling gene expression has garnered increased interest in recent years. Sequencing projects, such as Fantom3 for mouse and H-InvDB for human, have generated abundant data on transcribed components of mammalian cells, the majority of which appear not to be protein-coding. However, much of the non-protein-coding transcriptome could merely be a consequence of 'transcription noise'. It is therefore essential to use bioinformatic a  ...[more]

Similar Datasets

| S-EPMC3492712 | biostudies-other
| S-EPMC10487962 | biostudies-literature
| S-EPMC3061462 | biostudies-literature
| S-EPMC6128939 | biostudies-literature
| S-EPMC6378714 | biostudies-literature
| S-EPMC8508152 | biostudies-literature
| S-EPMC9652368 | biostudies-literature
| S-EPMC5577775 | biostudies-literature
| S-EPMC6262761 | biostudies-literature
| S-EPMC6779387 | biostudies-literature