使用基于引用集的编码保护隐私的记录链接：单参数方法

IF 3.4 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems Pub Date : 2025-05-15 DOI:10.1016/j.is.2025.102569

Sumayya Ziyad , Peter Christen , Anushka Vidanage , Charini Nanayakkara , Rainer Schnell

{"title":"使用基于引用集的编码保护隐私的记录链接：单参数方法","authors":"Sumayya Ziyad , Peter Christen , Anushka Vidanage , Charini Nanayakkara , Rainer Schnell","doi":"10.1016/j.is.2025.102569","DOIUrl":null,"url":null,"abstract":"<div><div>Record linkage is the process of matching records that refer to the same entity across two or more databases. In many application areas, ranging from healthcare to government services, the databases to be linked contain sensitive personal information, and hence, cannot be shared across organisations. Privacy-Preserving Record Linkage (PPRL) aims to overcome this challenge by facilitating the comparison of records that have been encoded or encrypted, thereby allowing linkage without the need of sharing any sensitive data. While various PPRL techniques have been developed, most of them do not properly address privacy concerns, such as the various vulnerabilities of encoded data with regard to cryptanalysis attacks. Existing PPRL methods, furthermore, do not provide conceptual analyses of how a user should set the various parameters required, possibly leading to sub-optimal results with regard to both linkage quality and privacy protection. Here we present a <em>novel encoding method for PPRL that employs reference q-gram sets to generate bit arrays that represent sensitive values. Our method requires a single user parameter that determines a trade-off between linkage quality, scalability, and privacy.</em> All other parameters are either data driven or have strong bounds based on the user-set parameter. Furthermore, our method addresses the length, frequency, and pattern-based PPRL vulnerabilities that are exploited by existing PPRL attacks. We conceptually analyse our method and experimentally evaluate it using multiple databases. Our results show that our method provides robust results for both high linkage quality and strong privacy protection.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102569"},"PeriodicalIF":3.4000,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Privacy-preserving record linkage using reference set based encoding: A single parameter method\",\"authors\":\"Sumayya Ziyad , Peter Christen , Anushka Vidanage , Charini Nanayakkara , Rainer Schnell\",\"doi\":\"10.1016/j.is.2025.102569\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Record linkage is the process of matching records that refer to the same entity across two or more databases. In many application areas, ranging from healthcare to government services, the databases to be linked contain sensitive personal information, and hence, cannot be shared across organisations. Privacy-Preserving Record Linkage (PPRL) aims to overcome this challenge by facilitating the comparison of records that have been encoded or encrypted, thereby allowing linkage without the need of sharing any sensitive data. While various PPRL techniques have been developed, most of them do not properly address privacy concerns, such as the various vulnerabilities of encoded data with regard to cryptanalysis attacks. Existing PPRL methods, furthermore, do not provide conceptual analyses of how a user should set the various parameters required, possibly leading to sub-optimal results with regard to both linkage quality and privacy protection. Here we present a <em>novel encoding method for PPRL that employs reference q-gram sets to generate bit arrays that represent sensitive values. Our method requires a single user parameter that determines a trade-off between linkage quality, scalability, and privacy.</em> All other parameters are either data driven or have strong bounds based on the user-set parameter. Furthermore, our method addresses the length, frequency, and pattern-based PPRL vulnerabilities that are exploited by existing PPRL attacks. We conceptually analyse our method and experimentally evaluate it using multiple databases. Our results show that our method provides robust results for both high linkage quality and strong privacy protection.</div></div>\",\"PeriodicalId\":50363,\"journal\":{\"name\":\"Information Systems\",\"volume\":\"133 \",\"pages\":\"Article 102569\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306437925000535\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437925000535","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

记录链接是在两个或多个数据库中匹配引用同一实体的记录的过程。在许多应用领域，从医疗保健到政府服务，要链接的数据库包含敏感的个人信息，因此不能跨组织共享。隐私保护记录链接（PPRL）旨在通过促进已编码或加密的记录的比较来克服这一挑战，从而允许在不共享任何敏感数据的情况下进行链接。虽然已经开发了各种PPRL技术，但它们中的大多数都没有适当地解决隐私问题，例如与密码分析攻击有关的编码数据的各种漏洞。此外，现有的PPRL方法没有提供用户应该如何设置所需的各种参数的概念分析，这可能导致在链接质量和隐私保护方面的次优结果。本文提出了一种新的PPRL编码方法，该方法使用参考q-gram集来生成表示敏感值的位数组。我们的方法需要一个用户参数来决定链接质量、可伸缩性和隐私之间的权衡。所有其他参数要么是数据驱动的，要么具有基于用户集参数的强边界。此外，我们的方法解决了长度、频率和基于模式的PPRL漏洞，这些漏洞被现有的PPRL攻击所利用。我们从概念上分析了我们的方法，并使用多个数据库对其进行了实验评估。结果表明，该方法既具有较高的链接质量，又具有较强的隐私保护能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Privacy-preserving record linkage using reference set based encoding: A single parameter method

Record linkage is the process of matching records that refer to the same entity across two or more databases. In many application areas, ranging from healthcare to government services, the databases to be linked contain sensitive personal information, and hence, cannot be shared across organisations. Privacy-Preserving Record Linkage (PPRL) aims to overcome this challenge by facilitating the comparison of records that have been encoded or encrypted, thereby allowing linkage without the need of sharing any sensitive data. While various PPRL techniques have been developed, most of them do not properly address privacy concerns, such as the various vulnerabilities of encoded data with regard to cryptanalysis attacks. Existing PPRL methods, furthermore, do not provide conceptual analyses of how a user should set the various parameters required, possibly leading to sub-optimal results with regard to both linkage quality and privacy protection. Here we present a novel encoding method for PPRL that employs reference q-gram sets to generate bit arrays that represent sensitive values. Our method requires a single user parameter that determines a trade-off between linkage quality, scalability, and privacy. All other parameters are either data driven or have strong bounds based on the user-set parameter. Furthermore, our method addresses the length, frequency, and pattern-based PPRL vulnerabilities that are exploited by existing PPRL attacks. We conceptually analyse our method and experimentally evaluate it using multiple databases. Our results show that our method provides robust results for both high linkage quality and strong privacy protection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Systems 工程技术-计算机：信息系统

CiteScore

9.40

自引率

2.70%

发文量

112

审稿时长

53 days

期刊介绍： Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems. Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.