Project description:Motivation:Fluorescence localization microscopy is extensively used to study the details of spatial architecture of subcellular compartments. This modality relies on determination of spatial positions of fluorophores, labeling an extended biological structure, with precision exceeding the diffraction limit. Several established models describe influence of pixel size, signal-to-noise ratio and optical resolution on the localization precision. The labeling density has been also recognized as important factor affecting reconstruction fidelity of the imaged biological structure. However, quantitative data on combined influence of sampling and localization errors on the fidelity of reconstruction are scarce. It should be noted that processing localization microscopy data is similar to reconstruction of a continuous (extended) non-periodic signal from a non-uniform, noisy point samples. In two dimensions the problem may be formulated within the framework of matrix completion. However, no systematic approach has been adopted in microscopy, where images are typically rendered by representing localized molecules with Gaussian distributions (widths determined by localization precision). Results:We analyze the process of two-dimensional reconstruction of extended biological structures as a function of the density of registered emitters, localization precision and the area occupied by the rendered localized molecule. We quantify overall reconstruction fidelity with different established image similarity measures. Furthermore, we analyze the recovered similarity measure in the frequency space for different reconstruction protocols. We compare the cut-off frequency to the limiting sampling frequency, as determined by labeling density. Availability and implementation:The source code used in the simulations along with test images is available at https://github.com/blazi13/qbioimages. Contact:bruszczy@nencki.gov.pl or t.bernas@nencki.gov.pl. Supplementary information:Supplementary data are available at Bioinformatics online.
Project description:A topical review is presented of the rapidly developing interest in and storage options for the preservation and reuse of raw data within the scientific domain of the IUCr and its Commissions, each of which operates within a great diversity of instrumentation. A résumé is included of the case for raw diffraction data deposition. An overall context is set by highlighting the initiatives of science policy makers towards an 'Open Science' model within which crystallographers will increasingly work in the future; this will bring new funding opportunities but also new codes of procedure within open science frameworks. Skills education and training for crystallographers will need to be expanded. Overall, there are now the means and the organization for the preservation of raw crystallographic diffraction data via different types of archive, such as at universities, discipline-specific repositories (Integrated Resource for Reproducibility in Macromol-ecular Crystallography, Structural Biology Data Grid), general public data repositories (Zenodo, ResearchGate) and centralized neutron and X-ray facilities. Formulation of improved metadata descriptors for the raw data types of each of the IUCr Commissions is in progress; some detailed examples are provided. A number of specific case studies are presented, including an example research thread that provides complete open access to raw data.
Project description:BackgroundThe variety of medical documentation often leads to incompatible data elements that impede data integration between institutions. A common approach to standardize and distribute metadata definitions are ISO/IEC 11179 norm-compliant metadata repositories with top-down standardization. To the best of our knowledge, however, it is not yet common practice to reuse the content of publicly accessible metadata repositories for creation of case report forms or routine documentation. We suggest an alternative concept called pragmatic metadata repository, which enables a community-driven bottom-up approach for agreeing on data collection models. A pragmatic metadata repository collects real-world documentation and considers frequent metadata definitions as high quality with potential for reuse.MethodsWe implemented a pragmatic metadata repository proof of concept application and filled it with medical forms from the Portal of Medical Data Models. We applied this prototype in two use cases to demonstrate its capabilities for reusing metadata: first, integration into a study editor for the suggestion of data elements and, second, metadata synchronization between two institutions. Moreover, we evaluated the emergence of bottom-up standards in the prototype and two medical data managers assessed their quality for 24 medical concepts.ResultsThe resulting prototype contained 466,569 unique metadata definitions. Integration into the study editor led to a reuse of 1836 items and item groups. During the metadata synchronization, semantic codes of 4608 data elements were transferred. Our evaluation revealed that for less complex medical concepts weak bottom-up standards could be established. However, more diverse disease-related concepts showed no convergence of data elements due to an enormous heterogeneity of metadata. The survey showed fair agreement (Kalpha = 0.50, 95% CI 0.43-0.56) for good item quality of bottom-up standards.ConclusionsWe demonstrated the feasibility of the pragmatic metadata repository concept for medical documentation. Applications of the prototype in two use cases suggest that it facilitates the reuse of data elements. Our evaluation showed that bottom-up standardization based on a large collection of real-world metadata can yield useful results. The proposed concept shall not replace existing top-down approaches, rather it complements them by showing what is commonly used in the community to guide other researchers.
Project description:Metadata that are structured using principled schemas and that use terms from ontologies are essential to making biomedical data findable and reusable for downstream analyses. The largest source of metadata that describes the experimental protocol, funding, and scientific leadership of clinical studies is ClinicalTrials.gov. We evaluated whether values in 302,091 trial records adhere to expected data types and use terms from biomedical ontologies, whether records contain fields required by government regulations, and whether structured elements could replace free-text elements. Contact information, outcome measures, and study design are frequently missing or underspecified. Important fields for search, such as condition and intervention, are not restricted to ontologies, and almost half of the conditions are not denoted by MeSH terms, as recommended. Eligibility criteria are stored as semi-structured free text. Enforcing the presence of all required elements, requiring values for certain fields to be drawn from ontologies, and creating a structured eligibility criteria element would improve the reusability of data from ClinicalTrials.gov in systematic reviews, metanalyses, and matching of eligible patients to trials.
Project description:The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB, rcsb.org), the US data center for the global PDB archive, serves thousands of Data Depositors in the Americas and Oceania and makes 3D macromolecular structure data available at no charge and without usage restrictions to more than 1 million rcsb.org Users worldwide and 600 000 pdb101.rcsb.org education-focused Users around the globe. PDB Data Depositors include structural biologists using macromolecular crystallography, nuclear magnetic resonance spectroscopy and 3D electron microscopy. PDB Data Consumers include researchers, educators and students studying Fundamental Biology, Biomedicine, Biotechnology and Energy. Recent reorganization of RCSB PDB activities into four integrated, interdependent services is described in detail, together with tools and resources added over the past 2 years to RCSB PDB web portals in support of a 'Structural View of Biology.'
Project description:Studying the association of gene function, diseases, and regulatory gene network reconstruction demands data compatibility. Data from different databases follow distinct schemas and are accessible in heterogenic ways. Although the experiments differ, data may still be related to the same biological entities. Some entities may not be strictly biological, such as geolocations of habitats or paper references, but they provide a broader context for other entities. The same entities from different datasets can share similar properties, which may or may not be found within other datasets. Joint, simultaneous data fetching from multiple data sources is complicated for the end-user or, in many cases, unsupported and inefficient due to differences in data structures and ways of accessing the data. We propose BioGraph-a new model that enables connecting and retrieving information from the linked biological data that originated from diverse datasets. We have tested the model on metadata collected from five diverse public datasets and successfully constructed a knowledge graph containing more than 17 million model objects, of which 2.5 million are individual biological entity objects. The model enables the selection of complex patterns and retrieval of matched results that can be discovered only by joining the data from multiple sources.
Project description:BackgroundOne of the major objectives of the Multiple Sclerosis Data Alliance (MSDA) is to enable better discovery of multiple sclerosis (MS) real-world data (RWD).MethodsWe implemented the MSDA Catalogue, which is available worldwide. The current version of the MSDA Catalogue collects descriptive information on governance, purpose, inclusion criteria, procedures for data quality control, and how and which data are collected, including the use of e-health technologies and data on collection of COVID-19 variables. The current cataloguing procedure is performed in several manual steps, securing an effective catalogue.ResultsHerein we summarize the status of the MSDA Catalogue as of January 6, 2021. To date, 38 data sources across five continents are included in the MSDA Catalogue. These data sources differ in purpose, maturity, and variables collected, but this landscaping effort shows that there is substantial alignment on some domains. The MSDA Catalogue shows that personal data and basic disease data are the most collected categories of variables, whereas data on fatigue measurements and cognition scales are the least collected in MS registries/cohorts.ConclusionsThe Web-based MSDA Catalogue provides strategic overview and allows authorized end users to browse metadata profiles of data cohorts and data sources. There are many existing and arising RWD sources in MS. Detailed cataloguing of MS RWD is a first and useful step toward reducing the time needed to discover MS RWD sets and promoting collaboration.
Project description:Semantic segmentation of electron microscopy images using deep learning methods is a valuable tool for the detailed analysis of organelles and cell structures. However, these methods require a large amount of labeled ground truth data that is often unavailable. To address this limitation, we present a weighted average ensemble model that can automatically segment biological structures in electron microscopy images when trained with only a small dataset. Thus, we exploit the fact that a combination of diverse base-learners is able to outperform one single segmentation model. Our experiments with seven different biological electron microscopy datasets demonstrate quantitative and qualitative improvements. We show that the Grad-CAM method can be used to interpret and verify the prediction of our model. Compared with a standard U-Net, the performance of our method is superior for all tested datasets. Furthermore, our model leverages a limited number of labeled training data to segment the electron microscopy images and therefore has a high potential for automated biological applications.
Project description:Data and metadata interoperability between data storage systems is a critical component of the FAIR data principles. Programmatic and consistent means of reconciling metadata models between databases promote data exchange and thus increases its access to the scientific community. This process requires (i) metadata mapping between the models and (ii) software to perform the mapping. Here, we describe our efforts to map metadata associated with genome assemblies between the National Center for Biotechnology Information (NCBI) data resources and the Chado biological database schema. We present mappings for multiple NCBI data structures and introduce a Tripal software module, Tripal EUtils, to pull metadata from NCBI into a Tripal/Chado database. We discuss potential mapping challenges and solutions and provide suggestions for future development to further increase interoperability between these platforms. Database URL: https://github.com/NAL-i5K/tripal_eutils.
Project description:Microscopy image analysis has recently made enormous progress both in terms of accuracy and speed thanks to machine learning methods and improved computational resources. This greatly facilitates the online adaptation of microscopy experimental plans using real-time information of the observed systems and their environments. Applications in which reactiveness is needed are multifarious. Here we report MicroMator, an open and flexible software for defining and driving reactive microscopy experiments. It provides a Python software environment and an extensible set of modules that greatly facilitate the definition of events with triggers and effects interacting with the experiment. We provide a pedagogic example performing dynamic adaptation of fluorescence illumination on bacteria, and demonstrate MicroMator's potential via two challenging case studies in yeast to single-cell control and single-cell recombination, both requiring real-time tracking and light targeting at the single-cell level.