dc.contributor.author
Wirth, Felix Nikolaus
dc.contributor.author
Meurers, Thierry
dc.contributor.author
Johns, Marco
dc.contributor.author
Prasser, Fabian
dc.date.accessioned
2023-03-15T15:28:15Z
dc.date.available
2023-03-15T15:28:15Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/38410
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-38128
dc.description.abstract
Background: Data sharing is considered a crucial part of modern medical research. Unfortunately, despite its advantages, it often faces obstacles, especially data privacy challenges. As a result, various approaches and infrastructures have been developed that aim to ensure that patients and research participants remain anonymous when data is shared. However, privacy protection typically comes at a cost, e.g. restrictions regarding the types of analyses that can be performed on shared data. What is lacking is a systematization making the trade-offs taken by different approaches transparent. The aim of the work described in this paper was to develop a systematization for the degree of privacy protection provided and the trade-offs taken by different data sharing methods. Based on this contribution, we categorized popular data sharing approaches and identified research gaps by analyzing combinations of promising properties and features that are not yet supported by existing approaches.
Methods: The systematization consists of different axes. Three axes relate to privacy protection aspects and were adopted from the popular Five Safes Framework: (1) safe data, addressing privacy at the input level, (2) safe settings, addressing privacy during shared processing, and (3) safe outputs, addressing privacy protection of analysis results. Three additional axes address the usefulness of approaches: (4) support for de-duplication, to enable the reconciliation of data belonging to the same individuals, (5) flexibility, to be able to adapt to different data analysis requirements, and (6) scalability, to maintain performance with increasing complexity of shared data or common analysis processes.
Results: Using the systematization, we identified three different categories of approaches: distributed data analyses, which exchange anonymous aggregated data, secure multi-party computation protocols, which exchange encrypted data, and data enclaves, which store pooled individual-level data in secure environments for access for analysis purposes. We identified important research gaps, including a lack of approaches enabling the de-duplication of horizontally distributed data or providing a high degree of flexibility.
Conclusions: There are fundamental differences between different data sharing approaches and several gaps in their functionality that may be interesting to investigate in future work. Our systematization can make the properties of privacy-preserving data sharing infrastructures more transparent and support decision makers and regulatory authorities with a better understanding of the trade-offs taken.
en
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
Biomedical data sharing
en
dc.subject
Systematization
en
dc.subject
Distributed computing
en
dc.subject
Secure multi-party computing
en
dc.subject
Data enclave
en
dc.subject.ddc
600 Technik, Medizin, angewandte Wissenschaften::610 Medizin und Gesundheit::610 Medizin und Gesundheit
dc.title
Privacy-preserving data sharing infrastructures for medical research: systematization and comparison
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.articlenumber
242
dcterms.bibliographicCitation.doi
10.1186/s12911-021-01602-x
dcterms.bibliographicCitation.journaltitle
BMC Medical Informatics and Decision Making
dcterms.bibliographicCitation.originalpublishername
Springer Nature
dcterms.bibliographicCitation.volume
21
refubium.affiliation
Charité - Universitätsmedizin Berlin
refubium.funding
Springer Nature DEAL
refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.bibliographicCitation.pmid
34384406
dcterms.isPartOf.eissn
1472-6947