Dataset Information

Harmonising and linking biomedical and clinical data across disparate data archives to enable integrative cross-biobank research.

ABSTRACT: A wealth of biospecimen samples are stored in modern globally distributed biobanks. Biomedical researchers worldwide need to be able to combine the available resources to improve the power of large-scale studies. A prerequisite for this effort is to be able to search and access phenotypic, clinical and other information about samples that are currently stored at biobanks in an integrated manner. However, privacy issues together with heterogeneous information systems and the lack of agreed-upon vocabularies have made specimen searching across multiple biobanks extremely challenging. We describe three case studies where we have linked samples and sample descriptions in order to facilitate global searching of available samples for research. The use cases include the ENGAGE (European Network for Genetic and Genomic Epidemiology) consortium comprising at least 39 cohorts, the SUMMIT (surrogate markers for micro- and macro-vascular hard endpoints for innovative diabetes tools) consortium and a pilot for data integration between a Swedish clinical health registry and a biobank. We used the Sample avAILability (SAIL) method for data linking: first, created harmonised variables and then annotated and made searchable information on the number of specimens available in individual biobanks for various phenotypic categories. By operating on this categorised availability data we sidestep many obstacles related to privacy that arise when handling real values and show that harmonised and annotated records about data availability across disparate biomedical archives provide a key methodological advance in pre-analysis exchange of information between biobanks, that is, during the project planning phase.

SUBMITTER: Spjuth O

PROVIDER: S-EPMC4929882 | biostudies-literature | 2016 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Harmonising and linking biomedical and clinical data across disparate data archives to enable integrative cross-biobank research.

Spjuth Ola O Krestyaninova Maria M Hastings Janna J Shen Huei-Yi HY Heikkinen Jani J Waldenberger Melanie M Langhammer Arnulf A Ladenvall Claes C Esko Tõnu T Persson Mats-Åke MÅ Heggland Jon J Dietrich Joern J Ose Sandra S Gieger Christian C Ried Janina S JS Peters Annette A Fortier Isabel I de Geus Eco J C EJ Klovins Janis J Zaharenko Linda L Willemsen Gonneke G Hottenga Jouke-Jan JJ Litton Jan-Eric JE Karvanen Juha J Boomsma Dorret I DI Groop Leif L Rung Johan J Palmgren Juni J Pedersen Nancy L NL McCarthy Mark I MI van Duijn Cornelia M CM Hveem Kristian K Metspalu Andres A Ripatti Samuli S Prokopenko Inga I Harris Jennifer R JR

European journal of human genetics : EJHG 20150826 4

A wealth of biospecimen samples are stored in modern globally distributed biobanks. Biomedical researchers worldwide need to be able to combine the available resources to improve the power of large-scale studies. A prerequisite for this effort is to be able to search and access phenotypic, clinical and other information about samples that are currently stored at biobanks in an integrated manner. However, privacy issues together with heterogeneous information systems and the lack of agreed-upon v ...[more]

PMID: 26306643

Similar Datasets

Project description:BackgroundBiobanks are highly organized infrastructures that allow the storage of human biological specimens associated with donors' personal and clinical data. These infrastructures play a key role in the development of translational medical research. In this context, we launched, in November 2015, the first biobank in Morocco (BRO Biobank) in order to promote biomedical research and provide opportunities to include Moroccan and North African ethnic groups in international biomedical studies. Here, we present the setup and the sample characteristics of BRO Biobank.MethodsPatients were recruited at several departments of two major health-care centers in the city of Oujda. Healthy donors were enrolled during blood donation campaigns all over Eastern Morocco. From each participant, personal, clinical, and biomedical data were collected, and several biospecimens were stored. Standard operating procedures have been established in accordance with international guidelines on human biobanks.ResultsBetween November 2015 and July 2020, 2446 participants were recruited into the BRO Biobank, of whom 2013 were healthy donors, and 433 were patients. For healthy donors, the median age was 35 years with a range between 18 and 65 years and the consanguinity rate was 28.96%. For patients, the median age was 11 years with a range between 1 day and 83 years. Among these patients, 55% had rare diseases (hemoglobinopathies, intellectual disabilities, disorders of sex differentiation, myopathies, etc.), 13% had lung cancer, 4% suffered from hematological neoplasms, 3% were from the kidney transplantation project, and 25% had unknown diagnoses. The BRO Biobank has collected 5092 biospecimens, including blood, white blood cells, plasma, serum, urine, frozen tissue, FFPE tissue, and nucleic acids. A sample quality control has been implemented and suggested that samples of the BRO Biobank are of high quality and therefore suitable for high-throughput nucleic acid analysis.ConclusionsThe BRO Biobank is the largest sample collection in Morocco, and it is ready to provide samples to national and international research projects. Therefore, the BRO Biobank is a valuable resource for advancing translational medical research.

Project description:BackgroundData provenance refers to the origin, processing, and movement of data. Reliable and precise knowledge about data provenance has great potential to improve reproducibility as well as quality in biomedical research and, therefore, to foster good scientific practice. However, despite the increasing interest on data provenance technologies in the literature and their implementation in other disciplines, these technologies have not yet been widely adopted in biomedical research.ObjectiveThe aim of this scoping review was to provide a structured overview of the body of knowledge on provenance methods in biomedical research by systematizing articles covering data provenance technologies developed for or used in this application area; describing and comparing the functionalities as well as the design of the provenance technologies used; and identifying gaps in the literature, which could provide opportunities for future research on technologies that could receive more widespread adoption.MethodsFollowing a methodological framework for scoping studies and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines, articles were identified by searching the PubMed, IEEE Xplore, and Web of Science databases and subsequently screened for eligibility. We included original articles covering software-based provenance management for scientific research published between 2010 and 2021. A set of data items was defined along the following five axes: publication metadata, application scope, provenance aspects covered, data representation, and functionalities. The data items were extracted from the articles, stored in a charting spreadsheet, and summarized in tables and figures.ResultsWe identified 44 original articles published between 2010 and 2021. We found that the solutions described were heterogeneous along all axes. We also identified relationships among motivations for the use of provenance information, feature sets (capture, storage, retrieval, visualization, and analysis), and implementation details such as the data models and technologies used. The important gap that we identified is that only a few publications address the analysis of provenance data or use established provenance standards, such as PROV.ConclusionsThe heterogeneity of provenance methods, models, and implementations found in the literature points to the lack of a unified understanding of provenance concepts for biomedical data. Providing a common framework, a biomedical reference, and benchmarking data sets could foster the development of more comprehensive provenance solutions.

Project description:Genomics and molecular imaging, along with clinical and translational research have transformed biomedical science into a data-intensive scientific endeavor. For researchers to benefit from Big Data sets, developing long-term biomedical digital data preservation strategy is very important. In this opinion article, we discuss specific actions that researchers and institutions can take to make research data a continued resource even after research projects have reached the end of their lifecycle. The actions involve utilizing an Open Archival Information System model comprised of six functional entities: Ingest, Access, Data Management, Archival Storage, Administration and Preservation Planning. We believe that involvement of data stewards early in the digital data life-cycle management process can significantly contribute towards long term preservation of biomedical data. Developing data collection strategies consistent with institutional policies, and encouraging the use of common data elements in clinical research, patient registries and other human subject research can be advantageous for data sharing and integration purposes. Specifically, data stewards at the onset of research program should engage with established repositories and curators to develop data sustainability plans for research data. Placing equal importance on the requirements for initial activities (e.g., collection, processing, storage) with subsequent activities (data analysis, sharing) can improve data quality, provide traceability and support reproducibility. Preparing and tracking data provenance, using common data elements and biomedical ontologies are important for standardizing the data description, making the interpretation and reuse of data easier. The Big Data biomedical community requires scalable platform that can support the diversity and complexity of data ingest modes (e.g. machine, software or human entry modes). Secure virtual workspaces to integrate and manipulate data, with shared software programs (e.g., bioinformatics tools), can facilitate the FAIR (Findable, Accessible, Interoperable and Reusable) use of data for near- and long-term research needs.

Project description:Modern biomedical research aims at drawing biological conclusions from large, highly complex biological datasets. It has become common practice to make extensive use of high-throughput technologies that produce big amounts of heterogeneous data. In addition to the ever-improving accuracy, methods are getting faster and cheaper, resulting in a steadily increasing need for scalable data management and easily accessible means of analysis. We present qPortal, a platform providing users with an intuitive way to manage and analyze quantitative biological data. The backend leverages a variety of concepts and technologies, such as relational databases, data stores, data models and means of data transfer, as well as front-end solutions to give users access to data management and easy-to-use analysis options. Users are empowered to conduct their experiments from the experimental design to the visualization of their results through the platform. Here, we illustrate the feature-rich portal by simulating a biomedical study based on publically available data. We demonstrate the software's strength in supporting the entire project life cycle. The software supports the project design and registration, empowers users to do all-digital project management and finally provides means to perform analysis. We compare our approach to Galaxy, one of the most widely used scientific workflow and analysis platforms in computational biology. Application of both systems to a small case study shows the differences between a data-driven approach (qPortal) and a workflow-driven approach (Galaxy). qPortal, a one-stop-shop solution for biomedical projects offers up-to-date analysis pipelines, quality control workflows, and visualization tools. Through intensive user interactions, appropriate data models have been developed. These models build the foundation of our biological data management system and provide possibilities to annotate data, query metadata for statistics and future re-analysis on high-performance computing systems via coupling of workflow management systems. Integration of project and data management as well as workflow resources in one place present clear advantages over existing solutions.

Dataset Information

Harmonising and linking biomedical and clinical data across disparate data archives to enable integrative cross-biobank research.

Publications

Harmonising and linking biomedical and clinical data across disparate data archives to enable integrative cross-biobank research.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets