Dataset Information

A scalable software solution for anonymizing high-dimensional biomedical data.

ABSTRACT: Data anonymization is an important building block for ensuring privacy and fosters the reuse of data. However, transforming the data in a way that preserves the privacy of subjects while maintaining a high degree of data quality is challenging and particularly difficult when processing complex datasets that contain a high number of attributes. In this article we present how we extended the open source software ARX to improve its support for high-dimensional, biomedical datasets. For improving ARX's capability to find optimal transformations when processing high-dimensional data, we implement 2 novel search algorithms. The first is a greedy top-down approach and is oriented on a formally implemented bottom-up search. The second is based on a genetic algorithm. We evaluated the algorithms with different datasets, transformation methods, and privacy models. The novel algorithms mostly outperformed the previously implemented bottom-up search. In addition, we extended the GUI to provide a high degree of usability and performance when working with high-dimensional datasets. With our additions we have significantly enhanced ARX's ability to handle high-dimensional data in terms of processing performance as well as usability and thus can further facilitate data sharing.

SUBMITTER: Meurers T

PROVIDER: S-EPMC8489190 | biostudies-literature | 2021 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A scalable software solution for anonymizing high-dimensional biomedical data.

Meurers Thierry T Bild Raffael R Do Kieu-Mi KM Prasser Fabian F

GigaScience 20211001 10

<h4>Background</h4>Data anonymization is an important building block for ensuring privacy and fosters the reuse of data. However, transforming the data in a way that preserves the privacy of subjects while maintaining a high degree of data quality is challenging and particularly difficult when processing complex datasets that contain a high number of attributes. In this article we present how we extended the open source software ARX to improve its support for high-dimensional, biomedical dataset ...[more]

PMID: 34605868

Dataset Information

A scalable software solution for anonymizing high-dimensional biomedical data.

Publications

A scalable software solution for anonymizing high-dimensional biomedical data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Scalable Bayesian variable selection for structured high-dimensional data.
| S-EPMC6222001 | biostudies-literature

Similarity-driven multi-view embeddings from high-dimensional biomedical data.
| S-EPMC8009088 | biostudies-literature

Scalable analysis of multi-modal biomedical data.
| S-EPMC8434767 | biostudies-literature

qHTSWaterfall: 3-dimensional visualization software for quantitative high-throughput screening (qHTS) data.
| S-EPMC10064508 | biostudies-literature

EBIC: an open source software for high-dimensional and big data analyses.
| S-EPMC6736067 | biostudies-literature

OpenStats: A robust and scalable software package for reproducible analysis of high-throughput phenotypic data.
| S-EPMC7773254 | biostudies-literature

Scalable Clustering of High-Dimensional Data Technique Using SPCM with Ant Colony Optimization Intelligence.
| S-EPMC4606166 | biostudies-other

tidytof: a user-friendly framework for scalable and reproducible high-dimensional cytometry data analysis.
| S-EPMC10281957 | biostudies-literature

GATE: software for the analysis and visualization of high-dimensional time series expression data.
| S-EPMC2796822 | biostudies-literature

An end-to-end software solution for the analysis of high-throughput single-cell migration data.
| S-EPMC5304333 | biostudies-literature