Browse
Submit Data
Databases
API
Help

Dataset Information

18 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base.

ABSTRACT: Integrating reference datasets (e.g. from high-throughput experiments) with unstructured and manually-assembled information (e.g. notes or comments from individual researchers) has the potential to tailor bioinformatic analyses to specific needs and to lead to new insights. However, developing bespoke analysis pipelines from scratch is time-consuming, and general tools for exploring such heterogeneous data are not available. We argue that by treating all data as text, a knowledge-base can accommodate a range of bioinformatic data types and applications. We show that a database coupled to nearest-neighbor algorithms can address common tasks such as gene-set analysis as well as specific tasks such as ontology translation. We further show that a mathematical transformation motivated by diffusion can be effective for exploration across heterogeneous datasets. Diffusion enables the knowledge-base to begin with a sparse query, impute more features, and find matches that would otherwise remain hidden. This can be used, for example, to map multi-modal queries consisting of gene symbols and phenotypes to descriptions of diseases. Diffusion also enables user-driven learning: when the knowledge-base cannot provide satisfactory search results in the first instance, users can improve the results in real-time by adding domain-specific knowledge. User-driven learning has implications for data management, integration, and curation.

SUBMITTER: Konopka T

PROVIDER: S-EPMC8382188 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Json Xml

Similar Datasets

Data-Free Knowledge Distillation for Heterogeneous Federated Learning.

Project description:Federated Learning (FL) is a decentralized machine-learning paradigm in which a global server iteratively aggregates the model parameters of local users without accessing their data. User heterogeneity has imposed significant challenges to FL, which can incur drifted global models that are slow to converge. Knowledge Distillation has recently emerged to tackle this issue, by refining the server model using aggregated knowledge from heterogeneous users, other than directly aggregating their model parameters. This approach, however, depends on a proxy dataset, making it impractical unless such prerequisite is satisfied. Moreover, the ensemble knowledge is not fully utilized to guide local model learning, which may in turn affect the quality of the aggregated model. In this work, we propose a data-free knowledge distillation approach to address heterogeneous FL, where the server learns a lightweight generator to ensemble user information in a data-free manner, which is then broadcasted to users, regulating local training using the learned knowledge as an inductive bias. Empirical studies powered by theoretical implications show that, our approach facilitates FL with better generalization performance using fewer communication rounds, compared with the state-of-the-art.

| S-EPMC9036494 | biostudies-literature

Schema: metric learning enables interpretable synthesis of heterogeneous single-cell modalities.

Project description:A complete understanding of biological processes requires synthesizing information across heterogeneous modalities, such as age, disease status, or gene expression. Technological advances in single-cell profiling have enabled researchers to assay multiple modalities simultaneously. We present Schema, which uses a principled metric learning strategy that identifies informative features in a modality to synthesize disparate modalities into a single coherent interpretation. We use Schema to infer cell types by integrating gene expression and chromatin accessibility data; demonstrate informative data visualizations that synthesize multiple modalities; perform differential gene expression analysis in the context of spatial variability; and estimate evolutionary pressure on peptide sequences.

| S-EPMC8091541 | biostudies-literature

Integrating educational knowledge: reactivation of prior knowledge during educational learning enhances memory integration.

Project description:In everyday life and in education, we continuously build and structure our knowledge. Successful knowledge construction is suggested to happen through reactivation of previously learned information during new learning. This reactivation is presumed to lead to integration of old and new memories and strengthen long-term retention. Additionally, congruency with prior knowledge is shown to enhance subsequent memory. However, it is unknown how subjective reactivation and congruency jointly influence learning in an educational context. In two experiments, we investigated this question using an AB-AC inference paradigm where students were asked to first study an AB (word-picture) and then an AC-association (word-description). BC-associations were either congruent or incongruent and were linked by a common, unknown word (A). During AC-learning, participants were instructed to actively reactivate B (the picture) and report their subjective reactivation strength. Participants were first-year university students studying either psychology or family studies and the stimuli consisted of new information from their curricula. We expected that both reactivation and congruency would enhance subsequent associative memory for the inferred BC-association. This was assessed by cueing participants with C (the description) and asking to freely describe the associated picture. Results show a significant enhancement of both B-reactivation and congruency on associative memory scores in both experiments. Additionally, subjective meta-memory measures exhibited the same effect. These outcomes, showing beneficial effects of both reactivation and congruency on memory formation, can be of interest to educational practice, where effectively building knowledge through reactivation is imperative for success.

| S-EPMC6220240 | biostudies-literature

The Micronutrient Genomics Project: a community-driven knowledge base for micronutrient research.

Project description:Micronutrients influence multiple metabolic pathways including oxidative and inflammatory processes. Optimum micronutrient supply is important for the maintenance of homeostasis in metabolism and, ultimately, for maintaining good health. With advances in systems biology and genomics technologies, it is becoming feasible to assess the activity of single and multiple micronutrients in their complete biological context. Existing research collects fragments of information, which are not stored systematically and are thus not optimally disseminated. The Micronutrient Genomics Project (MGP) was established as a community-driven project to facilitate the development of systematic capture, storage, management, analyses, and dissemination of data and knowledge generated by biological studies focused on micronutrient-genome interactions. Specifically, the MGP creates a public portal and open-source bioinformatics toolbox for all "omics" information and evaluation of micronutrient and health studies. The core of the project focuses on access to, and visualization of, genetic/genomic, transcriptomic, proteomic and metabolomic information related to micronutrients. For each micronutrient, an expert group is or will be established combining the various relevant areas (including genetics, nutrition, biochemistry, and epidemiology). Each expert group will (1) collect all available knowledge, (2) collaborate with bioinformatics teams towards constructing the pathways and biological networks, and (3) publish their findings on a regular basis. The project is coordinated in a transparent manner, regular meetings are organized and dissemination is arranged through tools, a toolbox web portal, a communications website and dedicated publications.

| S-EPMC2989004 | biostudies-literature

Domain knowledge integration into deep learning for typhoon intensity classification.

Project description:In this report, we propose a deep learning technique for high-accuracy estimation of the intensity class of a typhoon from a single satellite image, by incorporating meteorological domain knowledge. By using the Visual Geometric Group's model, VGG-16, with images preprocessed with fisheye distortion, which enhances a typhoon's eye, eyewall, and cloud distribution, we achieved much higher classification accuracy than that of a previous study, even with sequential-split validation. Through comparison of t-distributed stochastic neighbor embedding (t-SNE) plots for the feature maps of VGG with the original satellite images, we also verified that the fisheye preprocessing facilitated cluster formation, suggesting that our model could successfully extract image features related to the typhoon intensity class. Moreover, gradient-weighted class activation mapping (Grad-CAM) was applied to highlight the eye and the cloud distributions surrounding the eye, which are important regions for intensity classification; the results suggest that our model qualitatively gained a viewpoint similar to that of domain experts. A series of analyses revealed that the data-driven approach using only deep learning has limitations, and the integration of domain knowledge could bring new breakthroughs.

| S-EPMC8217498 | biostudies-literature

Synthesize heterogeneous biological knowledge via representation learning for Alzheimer's disease drug repurposing.

Project description:Developing drugs for treating Alzheimer's disease has been extremely challenging and costly due to limited knowledge of underlying mechanisms and therapeutic targets. To address the challenge in AD drug development, we developed a multi-task deep learning pipeline that learns biological interactions and AD risk genes, then utilizes multi-level evidence on drug efficacy to identify repurposable drug candidates. Using the embedding derived from the model, we ranked drug candidates based on evidence from post-treatment transcriptomic patterns, efficacy in preclinical models, population-based treatment effects, and clinical trials. We mechanistically validated the top-ranked candidates in neuronal cells, identifying drug combinations with efficacy in reducing oxidative stress and safety in maintaining neuronal viability and morphology. Our neuronal response experiments confirmed several biologically efficacious drug combinations. This pipeline showed that harmonizing heterogeneous and complementary data/knowledge, including human interactome, transcriptome patterns, experimental efficacy, and real-world patient data shed light on the drug development of complex diseases.

| S-EPMC9804117 | biostudies-literature

Modality Attention and Sampling Enables Deep Learning with Heterogeneous Marker Combinations in Fluorescence Microscopy.

Project description:Fluorescence microscopy allows for a detailed inspection of cells, cellular networks, and anatomical landmarks by staining with a variety of carefully-selected markers visualized as color channels. Quantitative characterization of structures in acquired images often relies on automatic image analysis methods. Despite the success of deep learning methods in other vision applications, their potential for fluorescence image analysis remains underexploited. One reason lies in the considerable workload required to train accurate models, which are normally specific for a given combination of markers, and therefore applicable to a very restricted number of experimental settings. We herein propose Marker Sampling and Excite - a neural network approach with a modality sampling strategy and a novel attention module that together enable (i) flexible training with heterogeneous datasets with combinations of markers and (ii) successful utility of learned models on arbitrary subsets of markers prospectively. We show that our single neural network solution performs comparably to an upper bound scenario where an ensemble of many networks is naïvely trained for each possible marker combination separately. In addition, we demonstrate the feasibility of this framework in high-throughput biological analysis by revising a recent quantitative characterization of bone marrow vasculature in 3D confocal microscopy datasets and further confirm the validity of our approach on an additional, significantly different dataset of microvessels in fetal liver tissues. Not only can our work substantially ameliorate the use of deep learning in fluorescence microscopy analysis, but it can also be utilized in other fields with incomplete data acquisitions and missing modalities.

| S-EPMC7611676 | biostudies-literature

An open access medical knowledge base for community driven diagnostic decision support system development.

Project description:IntroductionWhile early diagnostic decision support systems were built around knowledge bases, more recent systems employ machine learning to consume large amounts of health data. We argue curated knowledge bases will remain an important component of future diagnostic decision support systems by providing ground truth and facilitating explainable human-computer interaction, but that prototype development is hampered by the lack of freely available computable knowledge bases.MethodsWe constructed an open access knowledge base and evaluated its potential in the context of a prototype decision support system. We developed a modified set-covering algorithm to benchmark the performance of our knowledge base compared to existing platforms. Testing was based on case reports from selected literature and medical student preparatory material.ResultsThe knowledge base contains over 2000 ICD-10 coded diseases and 450 RX-Norm coded medications, with over 8000 unique observations encoded as SNOMED or LOINC semantic terms. Using 117 medical cases, we found the accuracy of the knowledge base and test algorithm to be comparable to established diagnostic tools such as Isabel and DXplain. Our prototype, as well as DXplain, showed the correct answer as "best suggestion" in 33% of the cases. While we identified shortcomings during development and evaluation, we found the knowledge base to be a promising platform for decision support systems.ConclusionWe built and successfully evaluated an open access knowledge base to facilitate the development of new medical diagnostic assistants. This knowledge base can be expanded and curated by users and serve as a starting point to facilitate new technology development and system improvement in many contexts.

| S-EPMC6486985 | biostudies-literature

Single-particle diffusional fingerprinting: A machine-learning framework for quantitative analysis of heterogeneous diffusion.

Project description:Single-particle tracking (SPT) is a key tool for quantitative analysis of dynamic biological processes and has provided unprecedented insights into a wide range of systems such as receptor localization, enzyme propulsion, bacteria motility, and drug nanocarrier delivery. The inherently complex diffusion in such biological systems can vary drastically both in time and across systems, consequently imposing considerable analytical challenges, and currently requires an a priori knowledge of the system. Here we introduce a method for SPT data analysis, processing, and classification, which we term "diffusional fingerprinting." This method allows for dissecting the features that underlie diffusional behavior and establishing molecular identity, regardless of the underlying diffusion type. The method operates by isolating 17 descriptive features for each observed motion trajectory and generating a diffusional map of all features for each type of particle. Precise classification of the diffusing particle identity is then obtained by training a simple logistic regression model. A linear discriminant analysis generates a feature ranking that outputs the main differences among diffusional features, providing key mechanistic insights. Fingerprinting operates by both training on and predicting experimental data, without the need for pretraining on simulated data. We found this approach to work across a wide range of simulated and experimentally diverse systems, such as tracked lipases on fat substrates, transcription factors diffusing in cells, and nanoparticles diffusing in mucus. This flexibility ultimately supports diffusional fingerprinting's utility as a universal paradigm for SPT diffusional analysis and prediction.

| S-EPMC8346862 | biostudies-literature

Integration of mechanistic immunological knowledge into a machine learning pipeline improves predictions.

Project description:The dense network of interconnected cellular signalling responses that are quantifiable in peripheral immune cells provides a wealth of actionable immunological insights. Although high-throughput single-cell profiling techniques, including polychromatic flow and mass cytometry, have matured to a point that enables detailed immune profiling of patients in numerous clinical settings, the limited cohort size and high dimensionality of data increase the possibility of false-positive discoveries and model overfitting. We introduce a generalizable machine learning platform, the immunological Elastic-Net (iEN), which incorporates immunological knowledge directly into the predictive models. Importantly, the algorithm maintains the exploratory nature of the high-dimensional dataset, allowing for the inclusion of immune features with strong predictive capabilities even if not consistent with prior knowledge. In three independent studies our method demonstrates improved predictions for clinically relevant outcomes from mass cytometry data generated from whole blood, as well as a large simulated dataset. The iEN is available under an open-source licence.

| S-EPMC7720904 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data