Dataset Information

Term sets: A transparent and reproducible representation of clinical code sets.

ABSTRACT:

Objective

Clinical code sets are vital to research using routinely-collected electronic healthcare data. Existing code set engineering methods pose significant limitations when considering reproducible research. To improve the transparency and reusability of research, these code sets must abide by FAIR principles; this is not currently happening. We propose 'term sets', an equivalent alternative to code sets that are findable, accessible, interoperable and reusable.

Materials and methods

We describe a new code set representation, consisting of natural language inclusion and exclusion terms (term sets), and explain its relationship to code sets. We formally prove that any code set has a corresponding term set. We demonstrate utility by searching for recently published code sets, representing them as term sets, and reporting on the number of inclusion and exclusion terms compared with the size of the code set.

Results

Thirty-one code sets from 20 papers covering diverse disease domains were converted into term sets. The term sets were on average 74% the size of their equivalent original code set. Four term sets were larger due to deficiencies in the original code sets.

Discussion

Term sets can concisely represent any code set. This may reduce barriers for examining and reusing code sets, which may accelerate research using healthcare databases. We have developed open-source software that supports researchers using term sets.

Conclusion

Term sets are independent of clinical code terminologies and therefore: enable reproducible research; are resistant to terminology changes; and are less error-prone as they are shorter than the equivalent code set.

SUBMITTER: Williams R

PROVIDER: S-EPMC6375602 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Term sets: A transparent and reproducible representation of clinical code sets.

Williams Richard R Brown Benjamin B Kontopantelis Evan E van Staa Tjeerd T Peek Niels N

PloS one 20190214 2

<h4>Objective</h4>Clinical code sets are vital to research using routinely-collected electronic healthcare data. Existing code set engineering methods pose significant limitations when considering reproducible research. To improve the transparency and reusability of research, these code sets must abide by FAIR principles; this is not currently happening. We propose 'term sets', an equivalent alternative to code sets that are findable, accessible, interoperable and reusable.<h4>Materials and meth ...[more]

PMID: 30763407

Dataset Information

Term sets: A transparent and reproducible representation of clinical code sets.

Objective

Materials and methods

Results

Discussion

Conclusion

Publications

Term sets: A transparent and reproducible representation of clinical code sets.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Triboelectric-Based Transparent Secret Code.
| S-EPMC5908373 | biostudies-literature

Neural representation of transparent overlay.
| S-EPMC1820980 | biostudies-literature

Code sets out framework for "living wills".
| S-EPMC1403268 | biostudies-literature

Reproducible and transparent research practices in published neurology research.
| S-EPMC7049215 | biostudies-literature

Scanning the horizon: towards transparent and reproducible neuroimaging research.
| S-EPMC6910649 | biostudies-literature

Creating and sharing reproducible research code the workflowr way.
| S-EPMC6833990 | biostudies-literature

Prerequisite for reproducible science: a call to embrace code sharing.
| S-EPMC11386511 | biostudies-literature

Orchestrating and sharing large multimodal data for transparent and reproducible research.
| S-EPMC8490371 | biostudies-literature

Digital open science-Teaching digital tools for reproducible and transparent research.
| S-EPMC6095603 | biostudies-other

Matchtigs: minimum plain text representation of k-mer sets.
| S-EPMC10251615 | biostudies-literature