Unknown

Dataset Information

0

De novo computational prediction of non-coding RNA genes in prokaryotic genomes.


ABSTRACT:

Motivation

The computational identification of non-coding RNA (ncRNA) genes represents one of the most important and challenging problems in computational biology. Existing methods for ncRNA gene prediction rely mostly on homology information, thus limiting their applications to ncRNA genes with known homologues.

Results

We present a novel de novo prediction algorithm for ncRNA genes using features derived from the sequences and structures of known ncRNA genes in comparison to decoys. Using these features, we have trained a neural network-based classifier and have applied it to Escherichia coli and Sulfolobus solfataricus for genome-wide prediction of ncRNAs. Our method has an average prediction sensitivity and specificity of 68% and 70%, respectively, for identifying windows with potential for ncRNA genes in E.coli. By combining windows of different sizes and using positional filtering strategies, we predicted 601 candidate ncRNAs and recovered 41% of known ncRNAs in E.coli. We experimentally investigated six novel candidates using Northern blot analysis and found expression of three candidates: one represents a potential new ncRNA, one is associated with stable mRNA decay intermediates and one is a case of either a potential riboswitch or transcription attenuator involved in the regulation of cell division. In general, our approach enables the identification of both cis- and trans-acting ncRNAs in partially or completely sequenced microbial genomes without requiring homology or structural conservation.

Availability

The source code and results are available at http://csbl.bmb.uga.edu/publications/materials/tran/.

SUBMITTER: Tran TT 

PROVIDER: S-EPMC2773258 | biostudies-literature | 2009 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

De novo computational prediction of non-coding RNA genes in prokaryotic genomes.

Tran Thao T TT   Zhou Fengfeng F   Marshburn Sarah S   Stead Mark M   Kushner Sidney R SR   Xu Ying Y  

Bioinformatics (Oxford, England) 20090910 22


<h4>Motivation</h4>The computational identification of non-coding RNA (ncRNA) genes represents one of the most important and challenging problems in computational biology. Existing methods for ncRNA gene prediction rely mostly on homology information, thus limiting their applications to ncRNA genes with known homologues.<h4>Results</h4>We present a novel de novo prediction algorithm for ncRNA genes using features derived from the sequences and structures of known ncRNA genes in comparison to dec  ...[more]

Similar Datasets

| S-EPMC3441637 | biostudies-literature
| S-EPMC3213175 | biostudies-literature
| S-EPMC4495290 | biostudies-literature
| S-EPMC2765279 | biostudies-literature
| S-EPMC140549 | biostudies-literature
| S-EPMC3504067 | biostudies-literature
| S-EPMC3072456 | biostudies-literature
| S-EPMC4042238 | biostudies-literature
| S-EPMC9170768 | biostudies-literature
| S-EPMC2413156 | biostudies-literature