dc.contributor.author
Pilgram, Lisa
dc.contributor.author
Meurers, Thierry
dc.contributor.author
Malin, Bradley
dc.contributor.author
Schaeffner, Elke
dc.contributor.author
Eckardt, Kai-Uwe
dc.contributor.author
Prasser, Fabian
dc.contributor.author
GCKD Investigators
dc.date.accessioned
2025-07-25T10:51:44Z
dc.date.available
2025-07-25T10:51:44Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/48302
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-48025
dc.description.abstract
Background:
Sharing data from clinical studies can accelerate scientific progress, improve transparency, and increase the potential for innovation and collaboration. However, privacy concerns remain a barrier to data sharing. Certain concerns, such as reidentification risk, can be addressed through the application of anonymization algorithms, whereby data are altered so that it is no longer reasonably related to a person. Yet, such alterations have the potential to influence the data set’s statistical properties, such that the privacy-utility trade-off must be considered. This has been studied in theory, but evidence based on real-world individual-level clinical data is rare, and anonymization has not broadly been adopted in clinical practice.
Objective:
The goal of this study is to contribute to a better understanding of anonymization in the real world by comprehensively evaluating the privacy-utility trade-off of differently anonymized data using data and scientific results from the German Chronic Kidney Disease (GCKD) study.
Methods:
The GCKD data set extracted for this study consists of 5217 records and 70 variables. A 2-step procedure was followed to determine which variables constituted reidentification risks. To capture a large portion of the risk-utility space, we decided on risk thresholds ranging from 0.02 to 1. The data were then transformed via generalization and suppression, and the anonymization process was varied using a generic and a use case–specific configuration. To assess the utility of the anonymized GCKD data, general-purpose metrics (ie, data granularity and entropy), as well as use case–specific metrics (ie, reproducibility), were applied. Reproducibility was assessed by measuring the overlap of the 95% CI lengths between anonymized and original results.
Results:
Reproducibility measured by 95% CI overlap was higher than utility obtained from general-purpose metrics. For example, granularity varied between 68.2% and 87.6%, and entropy varied between 25.5% and 46.2%, whereas the average 95% CI overlap was above 90% for all risk thresholds applied. A nonoverlapping 95% CI was detected in 6 estimates across all analyses, but the overwhelming majority of estimates exhibited an overlap over 50%. The use case–specific configuration outperformed the generic one in terms of actual utility (ie, reproducibility) at the same level of privacy.
Conclusions:
Our results illustrate the challenges that anonymization faces when aiming to support multiple likely and possibly competing uses, while use case–specific anonymization can provide greater utility. This aspect should be taken into account when evaluating the associated costs of anonymized data and attempting to maintain sufficiently high levels of privacy for anonymized data.
en
dc.subject
data sharing
en
dc.subject
anonymization
en
dc.subject
deidentification
en
dc.subject
privacy-utility trade-off
en
dc.subject
privacy-enhancing technologies
en
dc.subject
medical informatics
en
dc.subject
identification
en
dc.subject
confidentiality
en
dc.subject
data science
en
dc.subject.ddc
600 Technik, Medizin, angewandte Wissenschaften::610 Medizin und Gesundheit::610 Medizin und Gesundheit
dc.title
The Costs of Anonymization: Case Study Using Clinical Data
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.articlenumber
e49445
dcterms.bibliographicCitation.doi
10.2196/49445
dcterms.bibliographicCitation.journaltitle
Journal of Medical Internet Research
dcterms.bibliographicCitation.originalpublishername
JMIR Publications
dcterms.bibliographicCitation.volume
26
refubium.affiliation
Charité - Universitätsmedizin Berlin
refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.bibliographicCitation.pmid
38657232
dcterms.isPartOf.eissn
1438-8871