Dataset Information

Extracting DNA words based on the sequence features: non-uniform distribution and integrity.

ABSTRACT: DNA sequence can be viewed as an unknown language with words as its functional units. Given that most sequence alignment algorithms such as the motif discovery algorithms depend on the quality of background information about sequences, it is necessary to develop an ab initio algorithm for extracting the "words" based only on the DNA sequences.We considered that non-uniform distribution and integrity were two important features of a word, based on which we developed an ab initio algorithm to extract "DNA words" that have potential functional meaning. A Kolmogorov-Smirnov test was used for consistency test of uniform distribution of DNA sequences, and the integrity was judged by the sequence and position alignment. Two random base sequences were adopted as negative control, and an English book was used as positive control to verify our algorithm. We applied our algorithm to the genomes of Saccharomyces cerevisiae and 10 strains of Escherichia coli to show the utility of the methods.The results provide strong evidences that the algorithm is a promising tool for ab initio building a DNA dictionary.Our method provides a fast way for large scale screening of important DNA elements and offers potential insights into the understanding of a genome.

SUBMITTER: Li Z

PROVIDER: S-EPMC4727310 | biostudies-literature | 2016 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Extracting DNA words based on the sequence features: non-uniform distribution and integrity.

Li Zhi Z Cao Hongyan H Cui Yuehua Y Zhang Yanbo Y

Theoretical biology & medical modelling 20160125

<h4>Background</h4>DNA sequence can be viewed as an unknown language with words as its functional units. Given that most sequence alignment algorithms such as the motif discovery algorithms depend on the quality of background information about sequences, it is necessary to develop an ab initio algorithm for extracting the "words" based only on the DNA sequences.<h4>Methods</h4>We considered that non-uniform distribution and integrity were two important features of a word, based on which we devel ...[more]

PMID: 26811154

Similar Datasets

Project description:BackgroundUnderstanding the long-term behavior of intracortically-recorded signals is essential for improving the performance of Brain Computer Interfaces. However, few studies have systematically investigated chronic neural recordings from an implanted microelectrode array in the human brain.MethodsIn this study, we show the applicability of wavelet decomposition method to extract and demonstrate the utility of long-term stable features in neural signals obtained from a microelectrode array implanted in the motor cortex of a human with tetraplegia. Wavelet decomposition was applied to the raw voltage data to generate mean wavelet power (MWP) features, which were further divided into three sub-frequency bands, low-frequency MWP (lf-MWP, 0-234 Hz), mid-frequency MWP (mf-MWP, 234 Hz-3.75 kHz) and high-frequency MWP (hf-MWP, >3.75 kHz). We analyzed these features using data collected from two experiments that were repeated over the course of about 3 years and compared their signal stability and decoding performance with the more standard threshold crossings, local field potentials (LFP), multi-unit activity (MUA) features obtained from the raw voltage recordings.ResultsAll neural features could stably track neural information for over 3 years post-implantation and were less prone to signal degradation compared to threshold crossings. Furthermore, when used as an input to support vector machine based decoding algorithms, the mf-MWP and MUA demonstrated significantly better performance, respectively, in classifying imagined motor tasks than using the lf-MWP, hf-MWP, LFP, or threshold crossings.ConclusionsOur results suggest that using MWP features in the appropriate frequency bands can provide an effective neural feature for brain computer interface intended for chronic applications.Trial registrationThis study was approved by the U.S. Food and Drug Administration (Investigational Device Exemption) and the Ohio State University Medical Center Institutional Review Board (Columbus, Ohio). The study conformed to institutional requirements for the conduct of human subjects and was filed on ClinicalTrials.gov (Identifier NCT01997125).

Dataset Information

Extracting DNA words based on the sequence features: non-uniform distribution and integrity.

Publications

Extracting DNA words based on the sequence features: non-uniform distribution and integrity.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets