Unknown

Dataset Information

0

MOLGENIS/connect: a system for semi-automatic integration of heterogeneous phenotype data with applications in biobanks.


ABSTRACT:

Motivation

While the size and number of biobanks, patient registries and other data collections are increasing, biomedical researchers still often need to pool data for statistical power, a task that requires time-intensive retrospective integration.

Results

To address this challenge, we developed MOLGENIS/connect, a semi-automatic system to find, match and pool data from different sources. The system shortlists relevant source attributes from thousands of candidates using ontology-based query expansion to overcome variations in terminology. Then it generates algorithms that transform source attributes to a common target DataSchema. These include unit conversion, categorical value matching and complex conversion patterns (e.g. calculation of BMI). In comparison to human-experts, MOLGENIS/connect was able to auto-generate 27% of the algorithms perfectly, with an additional 46% needing only minor editing, representing a reduction in the human effort and expertise needed to pool data.

Availability and implementation

Source code, binaries and documentation are available as open-source under LGPLv3 from http://github.com/molgenis/molgenis and www.molgenis.org/connect

Contact

: m.a.swertz@rug.nl

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Pang C 

PROVIDER: S-EPMC4937195 | biostudies-literature | 2016 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

MOLGENIS/connect: a system for semi-automatic integration of heterogeneous phenotype data with applications in biobanks.

Pang Chao C   van Enckevort David D   de Haan Mark M   Kelpin Fleur F   Jetten Jonathan J   Hendriksen Dennis D   de Boer Tommy T   Charbon Bart B   Winder Erwin E   van der Velde K Joeri KJ   Doiron Dany D   Fortier Isabel I   Hillege Hans H   Swertz Morris A MA  

Bioinformatics (Oxford, England) 20160321 14


<h4>Motivation</h4>While the size and number of biobanks, patient registries and other data collections are increasing, biomedical researchers still often need to pool data for statistical power, a task that requires time-intensive retrospective integration.<h4>Results</h4>To address this challenge, we developed MOLGENIS/connect, a semi-automatic system to find, match and pool data from different sources. The system shortlists relevant source attributes from thousands of candidates using ontolog  ...[more]

Similar Datasets

| S-EPMC3516146 | biostudies-literature
| S-EPMC7157985 | biostudies-literature
| S-EPMC3683061 | biostudies-literature
| S-EPMC8027235 | biostudies-literature
| S-EPMC10825117 | biostudies-literature
| S-EPMC3035801 | biostudies-other
| S-EPMC2929138 | biostudies-literature
| S-EPMC5892938 | biostudies-literature
| S-EPMC6474416 | biostudies-literature
| S-EPMC10241933 | biostudies-literature