Dataset Information

Markovian language model of the DNA and its information content.

ABSTRACT: This work proposes a Markovian memoryless model for the DNA that simplifies enormously the complexity of it. We encode nucleotide sequences into symbolic sequences, called words, from which we establish meaningful length of words and groups of words that share symbolic similarities. Interpreting a node to represent a group of similar words and edges to represent their functional connectivity allows us to construct a network of the grammatical rules governing the appearance of groups of words in the DNA. Our model allows us to predict the transition between groups of words in the DNA with unprecedented accuracy, and to easily calculate many informational quantities to better characterize the DNA. In addition, we reduce the DNA of known bacteria to a network of only tens of nodes, show how our model can be used to detect similar (or dissimilar) genes in different organisms, and which sequences of symbols are responsible for most of the information content of the DNA. Therefore, the DNA can indeed be treated as a language, a Markovian language, where a 'word' is an element of a group, and its grammar represents the rules behind the probability of transitions between any two groups.

SUBMITTER: Srivastava S

PROVIDER: S-EPMC4736934 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:BackgroundApproximately one-third of Japanese couples currently worry or previously worried about infertility. To develop strategies for the primary prevention of infertility as a population approach, it is important for the general population to be knowledgeable about fertility and infertility. The internet may contribute to the dissemination of information regarding infertility and fertility. However, few studies have examined online information about fertility.ObjectiveThis study aimed to quantitatively examine online Japanese-language information about lifestyle factors associated with reduced fertility.MethodsWe conducted online searches, using the 10 search terms with the highest numbers of searches that people hoping to conceive are likely to input in two major search engines in Japan (Google Japan and Yahoo! Japan). From the 2200 retrieved websites, 1181 duplicates and 500 websites unrelated to our objective were excluded, resulting in a final dataset of 519 websites. Coding guidelines were developed for the following lifestyle factors associated with reduced fertility: sexually transmitted diseases, psychological stress, cigarette smoking, alcohol use, nutrition and diet, physical activity and exercise, underweight, overweight and obesity, and environmental pollutants.ResultsIn terms of the website author's professional expertise, 69.6 % of the coding instances for the selected lifestyle factors were mentioned by hospitals, clinics, or the media, whereas only 1.7% were mentioned by laypersons. Psychological stress (20.1%) and sexually transmitted diseases (18.8%) were the most frequently mentioned lifestyle factors associated with reduced fertility. In contrast, cigarette smoking, alcohol use, nutrition and diet, physical activity and exercise, underweight, overweight and obesity, and environmental pollutants were mentioned relatively infrequently. The association between reduced fertility and sexually transmitted diseases was mentioned significantly more frequently by hospitals and clinics than by the media (P<.001). The association between reduced fertility and nutrition and diet was mentioned significantly more frequently by the media than by hospitals and clinics (P=.008). With regard to the sex of the target audience for the information, female-specific references to psychological stress, sexually transmitted diseases, nutrition and diet, underweight, physical activity and exercise, and overweight and obesity were significantly more frequent than were male-specific references to these lifestyle factors (psychological stress: P=.002, sexually transmitted diseases: P<.001, nutrition and diet: P<.001, underweight: P<.001, physical activity and exercise: P<.001, overweight and obesity: P<.001).ConclusionsOf the lifestyle factors known to be related to reduced fertility, cigarette smoking, alcohol use, and male-specific lifestyle factors are mentioned relatively infrequently in online information sources in Japan, and these factors should be discussed more in information published on websites.

Dataset Information

Markovian language model of the DNA and its information content.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets