Unknown

Dataset Information

0

Detecting the borders between coding and non-coding DNA regions in prokaryotes based on recursive segmentation and nucleotide doublets statistics.


ABSTRACT: BACKGROUND: Detecting the borders between coding and non-coding regions is an essential step in the genome annotation. And information entropy measures are useful for describing the signals in genome sequence. However, the accuracies of previous methods of finding borders based on entropy segmentation method still need to be improved. METHODS: In this study, we first applied a new recursive entropic segmentation method on DNA sequences to get preliminary significant cuts. A 22-symbol alphabet is used to capture the differential composition of nucleotide doublets and stop codon patterns along three phases in both DNA strands. This process requires no prior training datasets. RESULTS: Comparing with the previous segmentation methods, the experimental results on three bacteria genomes, Rickettsia prowazekii, Borrelia burgdorferi and E.coli, show that our approach improves the accuracy for finding the borders between coding and non-coding regions in DNA sequences. CONCLUSIONS: This paper presents a new segmentation method in prokaryotes based on Jensen-Rényi divergence with a 22-symbol alphabet. For three bacteria genomes, comparing to A12_JR method, our method raised the accuracy of finding the borders between protein coding and non-coding regions in DNA sequences.

SUBMITTER: Deng S 

PROVIDER: S-EPMC3535712 | biostudies-literature | 2012

REPOSITORIES: biostudies-literature

altmetric image

Publications

Detecting the borders between coding and non-coding DNA regions in prokaryotes based on recursive segmentation and nucleotide doublets statistics.

Deng Suping S   Shi Yixiang Y   Yuan Liyun L   Li Yixue Y   Ding Guohui G  

BMC genomics 20121217


<h4>Background</h4>Detecting the borders between coding and non-coding regions is an essential step in the genome annotation. And information entropy measures are useful for describing the signals in genome sequence. However, the accuracies of previous methods of finding borders based on entropy segmentation method still need to be improved.<h4>Methods</h4>In this study, we first applied a new recursive entropic segmentation method on DNA sequences to get preliminary significant cuts. A 22-symbo  ...[more]

Similar Datasets

| S-EPMC5181554 | biostudies-literature
| S-EPMC8275324 | biostudies-literature
| S-EPMC6292491 | biostudies-literature
| S-EPMC2746451 | biostudies-literature
| S-EPMC3111233 | biostudies-literature
| S-EPMC3602041 | biostudies-literature
| S-EPMC6604332 | biostudies-literature
| S-EPMC8016883 | biostudies-literature
| S-EPMC3230914 | biostudies-literature
| S-EPMC2268707 | biostudies-literature