Unknown

Dataset Information

0

Genome-wide de novo prediction of cis-regulatory binding sites in prokaryotes.


ABSTRACT: Although cis-regulatory binding sites (CRBSs) are at least as important as the coding sequences in a genome, our general understanding of them in most sequenced genomes is very limited due to the lack of efficient and accurate experimental and computational methods for their characterization, which has largely hindered our understanding of many important biological processes. In this article, we describe a novel algorithm for genome-wide de novo prediction of CRBSs with high accuracy. We designed our algorithm to circumvent three identified difficulties for CRBS prediction using comparative genomics principles based on a new method for the selection of reference genomes, a new metric for measuring the similarity of CRBSs, and a new graph clustering procedure. When operon structures are correctly predicted, our algorithm can predict 81% of known individual binding sites belonging to 94% of known cis-regulatory motifs in the Escherichia coli K12 genome, while achieving high prediction specificity. Our algorithm has also achieved similar prediction accuracy in the Bacillus subtilis genome, suggesting that it is very robust, and thus can be applied to any other sequenced prokaryotic genome. When compared with the prior state-of-the-art algorithms, our algorithm outperforms them in both prediction sensitivity and specificity.

SUBMITTER: Zhang S 

PROVIDER: S-EPMC2691844 | biostudies-other | 2009 Jun

REPOSITORIES: biostudies-other

altmetric image

Publications

Genome-wide de novo prediction of cis-regulatory binding sites in prokaryotes.

Zhang Shaoqiang S   Xu Minli M   Li Shan S   Su Zhengchang Z  

Nucleic acids research 20090421 10


Although cis-regulatory binding sites (CRBSs) are at least as important as the coding sequences in a genome, our general understanding of them in most sequenced genomes is very limited due to the lack of efficient and accurate experimental and computational methods for their characterization, which has largely hindered our understanding of many important biological processes. In this article, we describe a novel algorithm for genome-wide de novo prediction of CRBSs with high accuracy. We designe  ...[more]

Similar Datasets

| S-EPMC4757040 | biostudies-literature
| S-EPMC3225181 | biostudies-literature
| S-EPMC1129096 | biostudies-literature
| S-EPMC5984344 | biostudies-literature
| S-EPMC2377448 | biostudies-literature
| S-EPMC4182448 | biostudies-literature
| S-EPMC3235160 | biostudies-literature
| S-EPMC521067 | biostudies-literature
| S-EPMC4265420 | biostudies-literature
| S-EPMC514443 | biostudies-literature