Unknown

Dataset Information

0

BiobankUniverse: automatic matchmaking between datasets for biobank data discovery and integration.


ABSTRACT: Biobanks are indispensable for large-scale genetic/epidemiological studies, yet it remains difficult for researchers to determine which biobanks contain data matching their research questions.To overcome this, we developed a new matching algorithm that identifies pairs of related data elements between biobanks and research variables with high precision and recall. It integrates lexical comparison, Unified Medical Language System ontology tagging and semantic query expansion. The result is BiobankUniverse, a fast matchmaking service for biobanks and researchers. Biobankers upload their data elements and researchers their desired study variables, BiobankUniverse automatically shortlists matching attributes between them. Users can quickly explore matching potential and search for biobanks/data elements matching their research. They can also curate matches and define personalized data-universes.BiobankUniverse is available at http://biobankuniverse.com or can be downloaded as part of the open source MOLGENIS suite at http://github.com/molgenis/molgenis.m.a.swertz@rug.nl.Supplementary data are available at Bioinformatics online.

SUBMITTER: Pang C 

PROVIDER: S-EPMC5870622 | biostudies-literature | 2017 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

BiobankUniverse: automatic matchmaking between datasets for biobank data discovery and integration.

Pang Chao C   Kelpin Fleur F   van Enckevort David D   Eklund Niina N   Silander Kaisa K   Hendriksen Dennis D   de Haan Mark M   Jetten Jonathan J   de Boer Tommy T   Charbon Bart B   Holub Petr P   Hillege Hans H   Swertz Morris A MA  

Bioinformatics (Oxford, England) 20171101 22


<h4>Motivation</h4>Biobanks are indispensable for large-scale genetic/epidemiological studies, yet it remains difficult for researchers to determine which biobanks contain data matching their research questions.<h4>Results</h4>To overcome this, we developed a new matching algorithm that identifies pairs of related data elements between biobanks and research variables with high precision and recall. It integrates lexical comparison, Unified Medical Language System ontology tagging and semantic qu  ...[more]

Similar Datasets

| S-EPMC8443160 | biostudies-literature
| S-EPMC2945010 | biostudies-other
| S-EPMC9351350 | biostudies-literature
| S-EPMC4720990 | biostudies-literature
| S-EPMC8662848 | biostudies-literature
| S-EPMC6051285 | biostudies-literature
| S-EPMC8221386 | biostudies-literature
| S-EPMC7241240 | biostudies-literature
| S-EPMC5095171 | biostudies-literature
| S-EPMC8723155 | biostudies-literature