Mehmed Halilovic, Karen Otte, Thierry Meurers, Marco Alibone, Marion Ludwig, Nico Riedel, Steven Wolter, Lisa Kühnel, Steffen Hess, Fabian Prasser
{"title":"Anonymization of Health Insurance Claims Data for Medication Safety Assessments.","authors":"Mehmed Halilovic, Karen Otte, Thierry Meurers, Marco Alibone, Marion Ludwig, Nico Riedel, Steven Wolter, Lisa Kühnel, Steffen Hess, Fabian Prasser","doi":"10.3233/SHTI251407","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>The re-use of health insurance claims data for research purposes can provide valuable insights to improve patient care. However, as health data is often highly sensitive and subject to strict regulatory frameworks, the privacy of individuals must be protected. Anonymization is a common approach to do so, but finding an effective strategy is challenging due to an inherent trade-off between privacy protection and data utility. A structured approach is needed to balance these objectives and guide the selection of appropriate anonymization strategies.</p><p><strong>Methods: </strong>In this paper, we present a systematic evaluation of twelve anonymization strategies applied to German health insurance claims data that has previously been used in a drug safety study. The dataset consisted of 1727 records and 45 variables. Based on a structured threat modeling, we compare a conservative and a threat modeling-based approach, each with six different privacy models and risk thresholds using the ARX Data Anonymization Tool. We assess general data utility and empirically evaluate residual privacy risks using both the Anonymeter framework and a membership inference attack.</p><p><strong>Results: </strong>Our results show that conservative anonymization ensures strong privacy protection but reduces data utility. In contrast, threat modeling retains more utility while still providing acceptable privacy under moderate thresholds.</p><p><strong>Conclusion: </strong>The proposed process enables a systematic comparison of privacy-utility trade-offs and can be adapted to other medical datasets. Our findings highlight the importance of context-specific anonymization strategies and empirical risk evaluation to guide anonymized data sharing in healthcare.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"331 ","pages":"283-291"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in health technology and informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI251407","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: The re-use of health insurance claims data for research purposes can provide valuable insights to improve patient care. However, as health data is often highly sensitive and subject to strict regulatory frameworks, the privacy of individuals must be protected. Anonymization is a common approach to do so, but finding an effective strategy is challenging due to an inherent trade-off between privacy protection and data utility. A structured approach is needed to balance these objectives and guide the selection of appropriate anonymization strategies.
Methods: In this paper, we present a systematic evaluation of twelve anonymization strategies applied to German health insurance claims data that has previously been used in a drug safety study. The dataset consisted of 1727 records and 45 variables. Based on a structured threat modeling, we compare a conservative and a threat modeling-based approach, each with six different privacy models and risk thresholds using the ARX Data Anonymization Tool. We assess general data utility and empirically evaluate residual privacy risks using both the Anonymeter framework and a membership inference attack.
Results: Our results show that conservative anonymization ensures strong privacy protection but reduces data utility. In contrast, threat modeling retains more utility while still providing acceptable privacy under moderate thresholds.
Conclusion: The proposed process enables a systematic comparison of privacy-utility trade-offs and can be adapted to other medical datasets. Our findings highlight the importance of context-specific anonymization strategies and empirical risk evaluation to guide anonymized data sharing in healthcare.