Project description:Knowledge Graph (KG) is becoming increasingly important in the biomedical field. Deriving new and reliable knowledge from existing knowledge by knowledge graph embedding technology is a cutting-edge method. Some add a variety of additional information to aid reasoning, namely multimodal reasoning. However, few works based on the existing biomedical KGs are focused on specific diseases. This work develops a construction and multimodal reasoning process of Specific Disease Knowledge Graphs (SDKGs). We construct SDKG-11, a SDKG set including five cancers, six non-cancer diseases, a combined Cancer5, and a combined Diseases11, aiming to discover new reliable knowledge and provide universal pre-trained knowledge for that specific disease field. SDKG-11 is obtained through original triplet extraction, standard entity set construction, entity linking, and relation linking. We implement multimodal reasoning by reverse-hyperplane projection for SDKGs based on structure, category, and description embeddings. Multimodal reasoning improves pre-existing models on all SDKGs using entity prediction task as the evaluation protocol. We verify the model's reliability in discovering new knowledge by manually proofreading predicted drug-gene, gene-disease, and disease-drug pairs. Using embedding results as initialization parameters for the biomolecular interaction classification, we demonstrate the universality of embedding models. The constructed SDKG-11 and the implementation by TensorFlow are available from https://github.com/ZhuChaoY/SDKG-11. Supplementary data are available at Bioinformatics online.
Project description:MotivationSynthetic lethality (SL) is a promising strategy for anticancer therapy, as inhibiting SL partners of genes with cancer-specific mutations can selectively kill the cancer cells without harming the normal cells. Wet-lab techniques for SL screening have issues like high cost and off-target effects. Computational methods can help address these issues. Previous machine learning methods leverage known SL pairs, and the use of knowledge graphs (KGs) can significantly enhance the prediction performance. However, the subgraph structures of KG have not been fully explored. Besides, most machine learning methods lack interpretability, which is an obstacle for wide applications of machine learning to SL identification.ResultsWe present a model named KR4SL to predict SL partners for a given primary gene. It captures the structural semantics of a KG by efficiently constructing and learning from relational digraphs in the KG. To encode the semantic information of the relational digraphs, we fuse textual semantics of entities into propagated messages and enhance the sequential semantics of paths using a recurrent neural network. Moreover, we design an attentive aggregator to identify critical subgraph structures that contribute the most to the SL prediction as explanations. Extensive experiments under different settings show that KR4SL significantly outperforms all the baselines. The explanatory subgraphs for the predicted gene pairs can unveil prediction process and mechanisms underlying synthetic lethality. The improved predictive power and interpretability indicate that deep learning is practically useful for SL-based cancer drug target discovery.Availability and implementationThe source code is freely available at https://github.com/JieZheng-ShanghaiTech/KR4SL.
Project description:BackgroundOntologies are representations of a conceptualization of a domain. Traditionally, ontologies in biology were represented as directed acyclic graphs (DAG) which represent the backbone taxonomy and additional relations between classes. These graphs are widely exploited for data analysis in the form of ontology enrichment or computation of semantic similarity. More recently, ontologies are developed in a formal language such as the Web Ontology Language (OWL) and consist of a set of axioms through which classes are defined or constrained. While the taxonomy of an ontology can be inferred directly from the axioms of an ontology as one of the standard OWL reasoning tasks, creating general graph structures from OWL ontologies that exploit the ontologies' semantic content remains a challenge.ResultsWe developed a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes. Searching for (existential) patterns in the deductive closure of ontologies, we can identify relations between classes that are implied but not asserted and generate graph structures that encode for a large part of the ontologies' semantic content. We demonstrate the advantages of our method by applying it to inference of protein-protein interactions through semantic similarity over the Gene Ontology and demonstrate that performance is increased when graph structures are inferred using deductive inference according to our method. Our software and experiment results are available at http://github.com/bio-ontology-research-group/Onto2Graph .ConclusionsOnto2Graph is a method to generate graph structures from OWL ontologies using automated reasoning. The resulting graphs can be used for improved ontology visualization and ontology-based data analysis.
Project description:Temporal knowledge graphs (TKGs) are critical tools for capturing the dynamic nature of facts that evolve over time, making them highly valuable in a broad spectrum of intelligent applications. In the domain of temporal knowledge graph extrapolation reasoning, the prediction of future occurrences is of great significance and presents considerable obstacles. While current models consider the fact changes over time and recognize that historical facts may recur, they often overlook the influence of past events on future predictions. Motivated by these considerations, this work introduces a novel temporal knowledge graph reasoning model, named Temporal Reasoning with Recurrent Encoding and Contrastive Learning (TRCL), which integrates recurrent encoding and contrastive learning techniques. The proposed model has the ability to capture the evolution of historical facts, generating representations of entities and relationships through recurrent encoding. Additionally, TRCL incorporates a global historical matrix to account for repeated historical occurrences and employs contrastive learning to alleviate the interference of historical facts in predicting future events. The TKG reasoning outcomes are subsequently derived through a time decoder. A quantity of experiments conducted on four benchmark datasets demonstrate the exceptional performance of the proposed TRCL model across a range of metrics, surpassing state-of-the-art TKG reasoning models. When compared to the strong baseline Time-Guided Recurrent Graph Network (TiRGN) model, the proposed TRCL achieves 1.03% improvements on ICEWS14 using mean reciprocal rank (MRR) evaluation metric. This innovative proposed method not only enhances the accuracy of TKG extrapolation, but also sets a new standard for robustness in dynamic knowledge graph applications, paving the way for future research and practical applications in predictive intelligence systems.
Project description:BackgroundMultimorbidity and frailty represent emerging global health burdens that have garnered increased attention from researchers over the past two decades. We conducted a scientometric analysis of the scientific literature on the coexistence of multimorbidity and frailty to assess major research domains, trends, and inform future lines of research.MethodsWe systematically retrieved scientific publications on multimorbidity and frailty from the Web of Science Core Collection, spanning from 2003 to 2023. Scientometric analysis was performed using CiteSpace and VOSviewer, enabling the visualization and evaluation of networks comprising co-citation references, co-occurring keywords, countries, institutions, authors, and journals.ResultsA total of 584 eligible publications were included in the analysis. An exponential rise in research interest in multimorbidity and frailty was observed, with an average annual growth rate of 47.92% in publications between 2003 and 2022. Three major research trends were identified: standardized definition and measurement of multimorbidity and frailty, comprehensive geriatric assessment utilizing multimorbidity and frailty instruments for older adults, and the multifaceted associations between these two conditions. The United States of America, Johns Hopkins University, Fried LP, and the Journal of the American Geriatrics Society were identified as the most influential entities within this field, representing the leading country, institution, author, and journal, respectively.ConclusionsScientometric analysis provides invaluable insights to clinicians and researchers involved in multimorbidity and frailty research by identifying intellectual bases and research trends. While the instruments and assessments of multimorbidity and frailty with scientific validity and reliability are of undeniable importance, further investigations are also warranted to unravel the underlying biological mechanisms of interactions between multimorbidity and frailty, explore the mental health aspects among older individuals with multimorbidity and frailty, and refine strategies to reduce prescriptions in this specific population.
Project description:We applied an innovation framework to sustainable livestock development research projects in Africa and Asia. The focus of these projects ranged from pastoral systems to poverty and ecosystems services mapping to market access by the poor to fodder and natural resource management to livestock parasite drug resistance. We found that these projects closed gaps between knowledge and action by combining different kinds of knowledge, learning, and boundary spanning approaches; by providing all partners with the same opportunities; and by building the capacity of all partners to innovate and communicate.
Project description:"Knowledge graphs" (KGs) have become a common approach for representing biomedical knowledge. In a KG, multiple biomedical data sets can be linked together as a graph representation, with nodes representing entities, such as "chemical substance" or "genes," and edges representing predicates, such as "causes" or "treats." Reasoning and inference algorithms can then be applied to the KG and used to generate new knowledge. We developed three KG-based question-answering systems as part of the Biomedical Data Translator program. These systems are typically tested and evaluated using traditional software engineering tools and approaches. In this study, we explored a team-based approach to test and evaluate the prototype "Translator Reasoners" through the application of Medical College Admission Test (MCAT) questions. Specifically, we describe three "hackathons," in which the developers of each of the three systems worked together with a moderator to determine whether the applications could be used to solve MCAT questions. The results demonstrate progressive improvement in system performance, with 0% (0/5) correct answers during the first hackathon, 75% (3/4) correct during the second hackathon, and 100% (5/5) correct during the final hackathon. We discuss the technical and sociologic lessons learned and conclude that MCAT questions can be applied successfully in the context of moderated hackathons to test and evaluate prototype KG-based question-answering systems, identify gaps in current capabilities, and improve performance. Finally, we highlight several published clinical and translational science applications of the Translator Reasoners.
Project description:BackgroundLimited knowledge and unclear underlying biology of many rare diseases pose significant challenges to patients, clinicians, and scientists. To address these challenges, there is an urgent need to inspire and encourage scientists to propose and pursue innovative research studies that aim to uncover the genetic and molecular causes of more rare diseases and ultimately to identify effective therapeutic solutions. A clear understanding of current research efforts, knowledge/research gaps, and funding patterns as scientific evidence is crucial to systematically accelerate the pace of research discovery in rare diseases, which is an overarching goal of this study.MethodsTo semantically represent NIH funding data for rare diseases and advance its use of effectively promoting rare disease research, we identified NIH funded projects for rare diseases by mapping GARD diseases to the project based on project titles; subsequently we presented and managed those identified projects in a knowledge graph using Neo4j software, hosted at NCATS, based on a pre-defined data model that captures semantics among the data. With this developed knowledge graph, we were able to perform several case studies to demonstrate scientific evidence generation for supporting rare disease research discovery.ResultsOf 5001 rare diseases belonging to 32 distinct disease categories, we identified 1294 diseases that are mapped to 45,647 distinct, NIH-funded projects obtained from the NIH ExPORTER by implementing semantic annotation of project titles. To capture semantic relationships presenting amongst mapped research funding data, we defined a data model comprised of seven primary classes and corresponding object and data properties. A Neo4j knowledge graph based on this predefined data model has been developed, and we performed multiple case studies over this knowledge graph to demonstrate its use in directing and promoting rare disease research.ConclusionWe developed an integrative knowledge graph with rare disease funding data and demonstrated its use as a source from where we can effectively identify and generate scientific evidence to support rare disease research. With the success of this preliminary study, we plan to implement advanced computational approaches for analyzing more funding related data, e.g., project abstracts and PubMed article abstracts, and linking to other types of biomedical data to perform more sophisticated research gap analysis and identify opportunities for future research in rare diseases.
Project description:Here, we present a protocol for conducting bibliometric analysis in biomedicine using CiteSpace and VOSviewer. We describe the steps for extracting data from Web of Science, data cleaning, and preprocessing. We then detail procedures for identifying research trends and collaboration networks by visualizing data with CiteSpace; mapping co-authorship, co-citation, and keyword co-occurrence using VOSviewer; and analyzing highly cited literature to identify key publications and trends. Finally, we outline techniques for interpreting the visualizations to draw meaningful conclusions about the research landscape. For complete details on the use and execution of this protocol, please refer to Li et al.1.
Project description:Rising global population and climate change realities dictate that agricultural productivity must be accelerated. Results from current traditional research approaches are difficult to extrapolate to all possible fields because they are dependent on specific soil types, weather conditions, and background management combinations that are not applicable nor translatable to all farms. A method that accurately evaluates the effectiveness of infinite cropping system interactions (involving multiple management practices) to increase maize and soybean yield across the US does not exist. Here, we utilize extensive databases and artificial intelligence algorithms and show that complex interactions, which cannot be evaluated in replicated trials, are associated with large crop yield variability and thus, potential for substantial yield increases. Our approach can accelerate agricultural research, identify sustainable practices, and help overcome future food demands.