Dataset Information

Using game theory to thwart multistage privacy intrusions when sharing data.

ABSTRACT: [Figure: see text].

SUBMITTER: Wan Z

PROVIDER: S-EPMC8664254 | biostudies-literature | 2021 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Using game theory to thwart multistage privacy intrusions when sharing data.

Wan Zhiyu Z Vorobeychik Yevgeniy Y Xia Weiyi W Liu Yongtai Y Wooders Myrna M Guo Jia J Yin Zhijun Z Clayton Ellen Wright EW Kantarcioglu Murat M Malin Bradley A BA

Science advances 20211210 50

Person-specific biomedical data are now widely collected, but its sharing raises privacy concerns, specifically about the re-identification of seemingly anonymous records. Formal re-identification risk assessment frameworks can inform decisions about whether and how to share data; current techniques, however, focus on scenarios where the data recipients use only one resource for re-identification purposes. This is a concern because recent attacks show that adversaries can access multiple resourc ...[more]

PMID: 34890225

Similar Datasets

Project description:BACKGROUND:Sharing research data uses resources effectively; enables large, diverse data sets; and supports rigor and reproducibility. However, sharing such data increases privacy risks for participants who may be re-identified by linking study data to outside data sets. These risks have been investigated for genetic and medical records but rarely for environmental data. OBJECTIVES:We evaluated how data in environmental health (EH) studies may be vulnerable to linkage and we investigated, in a case study, whether environmental measurements could contribute to inferring latent categories (e.g., geographic location), which increases privacy risks. METHODS:We identified 12 prominent EH studies, reviewed the data types collected, and evaluated the availability of outside data sets that overlap with study data. With data from the Household Exposure Study in California and Massachusetts and the Green Housing Study in Boston, Massachusetts, and Cincinnati, Ohio, we used k-means clustering and principal component analysis to investigate whether participants' region of residence could be inferred from measurements of chemicals in household air and dust. RESULTS:All 12 studies included at least two of five data types that overlap with outside data sets: geographic location (9 studies), medical data (9 studies), occupation (10 studies), housing characteristics (10 studies), and genetic data (7 studies). In our cluster analysis, participants' region of residence could be inferred with 80%-98% accuracy using environmental measurements with original laboratory reporting limits. DISCUSSION:EH studies frequently include data that are vulnerable to linkage with voter lists, tax and real estate data, professional licensing lists, and ancestry websites, and exposure measurements may be used to identify subgroup membership, increasing likelihood of linkage. Thus, unsupervised sharing of EH research data potentially raises substantial privacy risks. Empirical research can help characterize risks and evaluate technical solutions. Our findings reinforce the need for legal and policy protections to shield participants from potential harms of re-identification from data sharing. https://doi.org/10.1289/EHP4817.

Dataset Information

Using game theory to thwart multistage privacy intrusions when sharing data.

Publications

Using game theory to thwart multistage privacy intrusions when sharing data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets