Project description:Visualizing data through graphs can be an effective way to communicate one's results. A ubiquitous graph and common technique to communicate behavioral data is the bar graph. The bar graph was first invented in 1786 and little has changed in its format. Here, a replacement for the bar graph is proposed. The new format, called a hat graph, maintains some of the critical features of the bar graph such as its discrete elements, but eliminates redundancies that are problematic when the baseline is not at zero. Hat graphs also include design elements based on Gestalt principles of grouping and graph design principles. The effectiveness of the hat graph was tested in five empirical studies. Participants were nearly 40% faster to find and identify the condition that led to the biggest difference from baseline to final test when the data were plotted with hat graphs than with bar graphs. Participants were also more sensitive to the magnitude of an effect plotted with a hat graph compared with a bar graph that was restricted to having its baseline at zero. The recommendation is to use hat graphs when plotting data from discrete categories.
Project description:Modern problems of concept annotation associate an object of interest (gene, individual, text document) with a set of interrelated textual descriptors (functions, diseases, topics), often organized in concept hierarchies or ontologies. Most ontology can be seen as directed acyclic graphs (DAGs), where nodes represent concepts and edges represent relational ties between these concepts. Given an ontology graph, each object can only be annotated by a consistent sub-graph; that is, a sub-graph such that if an object is annotated by a particular concept, it must also be annotated by all other concepts that generalize it. Ontologies therefore provide a compact representation of a large space of possible consistent sub-graphs; however, until now we have not been aware of a practical algorithm that can enumerate such annotation spaces for a given ontology.We propose an algorithm for enumerating consistent sub-graphs of DAGs. The algorithm recursively partitions the graph into strictly smaller graphs until the resulting graph becomes a rooted tree (forest), for which a linear-time solution is computed. It then combines the tallies from graphs created in the recursion to obtain the final count. We prove the correctness of this algorithm, propose several practical accelerations, evaluate it on random graphs and then apply it to characterize four major biomedical ontologies. We believe this work provides valuable insights into the complexity of concept annotation spaces and its potential influence on the predictability of ontological annotation.https://github.com/shawn-peng/counting-consistent-sub-DAG.Supplementary data are available at Bioinformatics online.
Project description:Test compound one, 5,6-benzoflavone (BNF), was known to act through both the Ah receptor and Nrf2 receptor pathways, while test compounds two and three, 3H-1,2-dithiole-3-thione (D3T) and 4-methyl-5-pyrazinyl-3H-1,2-dithiole-3-thione (OLT), were known to act through the Nrf2 receptor pathway. Furthermore, D3T is known to be more potent and efficacious than OLT for Nrf2 activation. OLT has been shown to exhibit 20-50% of the efficacy of D3T for inhibition of alfatoxin-induced heptic foci. Nonetheless, because OLT is an approved drug, it is currently being evaluated in human phase II intervention trials of biomarkers of alfatoxin-related hepatocellular carcinoma. More recently, BNF was shown to be an effective chemopreventive agent in the rat mammary carcinogen model, inhibiting 7,12-dimethylbenz(a)anthracene DNA adduct formation in liver and mammary cells by 96 and 83% respectively. We used microarrays to study the structure activities that lie within the test compounds. Keywords: treatment effect study
Project description:The structure of RNA has been a natural subject for mathematical modeling, inviting many innovative computational frameworks. This single-stranded polynucleotide chain can fold upon itself in numerous ways to form hydrogen-bonded segments, imperfect with single-stranded loops. Illustrating these paired and non-paired interaction networks, known as RNA's secondary (2D) structure, using mathematical graph objects has been illuminating for RNA structure analysis. Building upon such seminal work from the 1970s and 1980s, graph models are now used to study not only RNA structure but also describe RNA's recurring modular units, sample the conformational space accessible to RNAs, predict RNA's three-dimensional folds, and apply the combined aspects to novel RNA design. In this article, we outline the development of the RNA-As-Graphs (or RAG) approach and highlight current applications to RNA structure prediction and design.
Project description:Graph theoretical concepts are useful for the description and analysis of interactions and relationships in biological systems. We give a brief introduction into some of the concepts and their areas of application in molecular biology. We discuss software that is available through the Bioconductor project and present a simple example application to the integration of a protein-protein interaction and a co-expression network.
Project description:MotivationPangenome graphs provide a complete representation of the mutual alignment of collections of genomes. These models offer the opportunity to study the entire genomic diversity of a population, including structurally complex regions. Nevertheless, analyzing hundreds of gigabase-scale genomes using pangenome graphs is difficult as it is not well-supported by existing tools. Hence, fast and versatile software is required to ask advanced questions to such data in an efficient way.ResultsWe wrote ODGI, a novel suite of tools that implements scalable algorithms and has an efficient in-memory representation of DNA pangenome graphs in the form of variation graphs. ODGI supports pre-built graphs in the Graphical Fragment Assembly format. ODGI includes tools for detecting complex regions, extracting pangenomic loci, removing artifacts, exploratory analysis, manipulation, validation, and visualization. Its fast parallel execution facilitates routine pangenomic tasks, as well as pipelines that can quickly answer complex biological questions of gigabase-scale pangenome graphs.AvailabilityODGI is published as free software under the MIT open source license. Source code can be downloaded from https://github.com/pangenome/odgi and documentation is available at https://odgi.readthedocs.io. ODGI can be installed via Bioconda https://bioconda.github.io/recipes/odgi/README.html or GNU Guix https://github.com/pangenome/odgi/blob/master/guix.scm.
Project description:Graph databases are constantly growing, and, at the same time, some of their data is the same or similar. Our experience with the management of the existing databases, especially the bigger ones, shows that certain vertices are particularly replicated there numerous times. Eliminating repetitive or even very similar data speeds up the access to database resources. We present a modification of this approach, where similarly we group together vertices of identical properties, but then additionally we join together groups of data that are located in distant parts of a graph. The second part of our approach is non-trivial. We show that the search for a partition of a given graph where each member of the partition has only pairwise distant vertices is NP-hard. We indicate a group of heuristics that try to solve our difficult computational problems and then we apply them to check the the effectiveness of our approach.
Project description:Matrix-valued data, where the sampling unit is a matrix consisting of rows and columns of measurements, are emerging in numerous scientific and business applications. Matrix Gaussian graphical model is a useful tool to characterize the conditional dependence structure of rows and columns. In this article, we employ nonconvex penalization to tackle the estimation of multiple graphs from matrix-valued data under a matrix normal distribution. We propose a highly efficient nonconvex optimization algorithm that can scale up for graphs with hundreds of nodes. We establish the asymptotic properties of the estimator, which requires less stringent conditions and has a sharper probability error bound than existing results. We demonstrate the efficacy of our proposed method through both simulations and real functional magnetic resonance imaging analyses.
Project description:Motivation:In 2012, Iqbal et al. introduced the colored de Bruijn graph, a variant of the classic de Bruijn graph, which is aimed at 'detecting and genotyping simple and complex genetic variants in an individual or population'. Because they are intended to be applied to massive population level data, it is essential that the graphs be represented efficiently. Unfortunately, current succinct de Bruijn graph representations are not directly applicable to the colored de Bruijn graph, which requires additional information to be succinctly encoded as well as support for non-standard traversal operations. Results:Our data structure dramatically reduces the amount of memory required to store and use the colored de Bruijn graph, with some penalty to runtime, allowing it to be applied in much larger and more ambitious sequence projects than was previously possible. Availability and Implementation:https://github.com/cosmo-team/cosmo/tree/VARI. Contact:martin.muggli@colostate.edu. Supplementary information:Supplementary data are available at Bioinformatics online.
Project description:Geometric approaches to network analysis combine simply defined models with great descriptive power. In this work we provide a method for embedding directed acyclic graphs (DAG) into Minkowski spacetime using Multidimensional scaling (MDS). First we generalise the classical MDS algorithm, defined only for metrics with a Riemannian signature, to manifolds of any metric signature. We then use this general method to develop an algorithm which exploits the causal structure of a DAG to assign space and time coordinates in a Minkowski spacetime to each vertex. As in the causal set approach to quantum gravity, causal connections in the discrete graph correspond to timelike separation in the continuous spacetime. The method is demonstrated by calculating embeddings for simple models of causal sets and random DAGs, as well as real citation networks. We find that the citation networks we test yield significantly more accurate embeddings that random DAGs of the same size. Finally we suggest a number of applications in citation analysis such as paper recommendation, identifying missing citations and fitting citation models to data using this geometric approach.