Dataset Information

MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs.

ABSTRACT: MOTIVATION:Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estimate such models simultaneously both for monomers and their dimers have been missing. RESULTS:We present an ADM-based mixture model for monomeric and dimeric TF-binding motifs and an expectation maximization algorithm MODER2 for learning such models from training data and seeds. The model is a mixture that includes monomers and dimers, built from the monomers, with a description of the dimeric structure (spacing, orientation). The technique is modular, meaning that the co-operative effect of dimerization is made explicit by evaluating the difference between expected and observed models. The model is validated using HT-SELEX and generated datasets, and by comparing to some earlier PPM and ADM techniques. The ADM models explain data slightly better than PPM models for 314 tested TFs (or their DNA-binding domains) from four families (bHLH, bZIP, ETS and Homeodomain), the ADM mixture models by MODER2 being the best on average. AVAILABILITY AND IMPLEMENTATION:Software implementation is available from https://github.com/jttoivon/moder2. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

SUBMITTER: Toivonen J

PROVIDER: S-EPMC7203737 | biostudies-literature | 2020 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs.

Toivonen Jarkko J Das Pratyush K PK Taipale Jussi J Ukkonen Esko E

Bioinformatics (Oxford, England) 20200501 9

<h4>Motivation</h4>Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estim ...[more]

PMID: 31999322

Similar Datasets

Project description:Molecular interactions between viral DNA and viral-encoded protein are a prerequisite for successful herpesvirus replication and production of new infectious virions. Here, we examined how the essential Kaposi's sarcoma-associated herpesvirus (KSHV) protein, RTA, binds to viral DNA using transmission electron microscopy (TEM). Previous studies using gel-based approaches to characterize RTA binding are important for studying the predominant form(s) of RTA within a population and identifying the DNA sequences that RTA binds with high affinity. However, using TEM we were able to examine individual protein-DNA complexes and capture the various oligomeric states of RTA when bound to DNA. Hundreds of images of individual DNA and protein molecules were collected and then quantified to map the DNA binding positions of RTA bound to the two KSHV lytic origins of replication encoded within the KSHV genome. The relative size of RTA or RTA bound to DNA were then compared to protein standards to determine whether RTA complexed with DNA was monomeric, dimeric, or formed larger oligomeric structures. We successfully analyzed a highly heterogenous dataset and identified new binding sites for RTA. This provides direct evidence that RTA forms dimers and high order multimers when bound to KSHV origin of replication DNA sequences. This work expands our understanding of RTA binding, and demonstrates the importance of employing methodologies that can characterize highly heterogenic populations of proteins.ImportanceKaposi's sarcoma-associated herpesvirus (KSHV) is a human herpesvirus associated with several human cancers, typically in patients with compromised immune systems. Herpesviruses establish lifelong infections in hosts in part due to the two phases of infection: the dormant and active phases. Effective antiviral treatments to prevent the production of new viruses are needed to treat KSHV. A detailed microscopy-based investigation of the molecular interactions between viral protein and viral DNA revealed how protein-protein interactions play a role in DNA binding specificity. This analysis will lead to a more in depth understanding of KSHV DNA replication and serve as the basis for anti-viral therapies that disrupt and prevent the protein-DNA interactions, thereby decreasing spread to new hosts.

Dataset Information

MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs.

Publications

MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets