Ontology highlight
ABSTRACT: Motivation
While the size and number of biobanks, patient registries and other data collections are increasing, biomedical researchers still often need to pool data for statistical power, a task that requires time-intensive retrospective integration.Results
To address this challenge, we developed MOLGENIS/connect, a semi-automatic system to find, match and pool data from different sources. The system shortlists relevant source attributes from thousands of candidates using ontology-based query expansion to overcome variations in terminology. Then it generates algorithms that transform source attributes to a common target DataSchema. These include unit conversion, categorical value matching and complex conversion patterns (e.g. calculation of BMI). In comparison to human-experts, MOLGENIS/connect was able to auto-generate 27% of the algorithms perfectly, with an additional 46% needing only minor editing, representing a reduction in the human effort and expertise needed to pool data.Availability and implementation
Source code, binaries and documentation are available as open-source under LGPLv3 from http://github.com/molgenis/molgenis and www.molgenis.org/connectContact
: m.a.swertz@rug.nlSupplementary information
Supplementary data are available at Bioinformatics online.
SUBMITTER: Pang C
PROVIDER: S-EPMC4937195 | biostudies-literature | 2016 Jul
REPOSITORIES: biostudies-literature
Pang Chao C van Enckevort David D de Haan Mark M Kelpin Fleur F Jetten Jonathan J Hendriksen Dennis D de Boer Tommy T Charbon Bart B Winder Erwin E van der Velde K Joeri KJ Doiron Dany D Fortier Isabel I Hillege Hans H Swertz Morris A MA
Bioinformatics (Oxford, England) 20160321 14
<h4>Motivation</h4>While the size and number of biobanks, patient registries and other data collections are increasing, biomedical researchers still often need to pool data for statistical power, a task that requires time-intensive retrospective integration.<h4>Results</h4>To address this challenge, we developed MOLGENIS/connect, a semi-automatic system to find, match and pool data from different sources. The system shortlists relevant source attributes from thousands of candidates using ontolog ...[more]