Unknown

Dataset Information

0

De novo search for non-coding RNA genes in the AT-rich genome of Dictyostelium discoideum: performance of Markov-dependent genome feature scoring.


ABSTRACT: Genome data are increasingly important in the computational identification of novel regulatory non-coding RNAs (ncRNAs). However, most ncRNA gene-finders are either specialized to well-characterized ncRNA gene families or require comparisons of closely related genomes. We developed a method for de novo screening for ncRNA genes with a nucleotide composition that stands out against the background genome based on a partial sum process. We compared the performance when assuming independent and first-order Markov-dependent nucleotides, respectively, and used Karlin-Altschul and Karlin-Dembo statistics to evaluate the significance of hits. We hypothesized that a first-order Markov-dependent process might have better power to detect ncRNA genes since nearest-neighbor models have been shown to be successful in predicting RNA structures. A model based on a first-order partial sum process (analyzing overlapping dinucleotides) had better sensitivity and specificity than a zeroth-order model when applied to the AT-rich genome of the amoeba Dictyostelium discoideum. In this genome, we detected 94% of previously known ncRNA genes (at this sensitivity, the false positive rate was estimated to be 25% in a simulated background). The predictions were further refined by clustering candidate genes according to sequence similarity and/or searching for an ncRNA-associated upstream element. We experimentally verified six out of 10 tested ncRNA gene predictions. We conclude that higher-order models, in combination with other information, are useful for identification of novel ncRNA gene families in single-genome analysis of D. discoideum. Our generalizable approach extends the range of genomic data that can be searched for novel ncRNA genes using well-grounded statistical methods.

SUBMITTER: Larsson P 

PROVIDER: S-EPMC2413156 | biostudies-literature | 2008 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

De novo search for non-coding RNA genes in the AT-rich genome of Dictyostelium discoideum: performance of Markov-dependent genome feature scoring.

Larsson Pontus P   Hinas Andrea A   Ardell David H DH   Kirsebom Leif A LA   Virtanen Anders A   Söderbom Fredrik F  

Genome research 20080317 6


Genome data are increasingly important in the computational identification of novel regulatory non-coding RNAs (ncRNAs). However, most ncRNA gene-finders are either specialized to well-characterized ncRNA gene families or require comparisons of closely related genomes. We developed a method for de novo screening for ncRNA genes with a nucleotide composition that stands out against the background genome based on a partial sum process. We compared the performance when assuming independent and firs  ...[more]

Similar Datasets

| S-EPMC1352341 | biostudies-literature
| S-EPMC516072 | biostudies-literature
| S-EPMC3492061 | biostudies-literature
2017-03-07 | GSE90829 | GEO
| S-EPMC3460934 | biostudies-literature
| S-EPMC2168402 | biostudies-literature
| PRJNA129601 | ENA
| PRJNA262669 | ENA
| PRJNA143419 | ENA
| PRJNA242439 | ENA