{"title":"使用基于引用集的编码保护隐私的记录链接:单参数方法","authors":"Sumayya Ziyad , Peter Christen , Anushka Vidanage , Charini Nanayakkara , Rainer Schnell","doi":"10.1016/j.is.2025.102569","DOIUrl":null,"url":null,"abstract":"<div><div>Record linkage is the process of matching records that refer to the same entity across two or more databases. In many application areas, ranging from healthcare to government services, the databases to be linked contain sensitive personal information, and hence, cannot be shared across organisations. Privacy-Preserving Record Linkage (PPRL) aims to overcome this challenge by facilitating the comparison of records that have been encoded or encrypted, thereby allowing linkage without the need of sharing any sensitive data. While various PPRL techniques have been developed, most of them do not properly address privacy concerns, such as the various vulnerabilities of encoded data with regard to cryptanalysis attacks. Existing PPRL methods, furthermore, do not provide conceptual analyses of how a user should set the various parameters required, possibly leading to sub-optimal results with regard to both linkage quality and privacy protection. Here we present a <em>novel encoding method for PPRL that employs reference q-gram sets to generate bit arrays that represent sensitive values. Our method requires a single user parameter that determines a trade-off between linkage quality, scalability, and privacy.</em> All other parameters are either data driven or have strong bounds based on the user-set parameter. Furthermore, our method addresses the length, frequency, and pattern-based PPRL vulnerabilities that are exploited by existing PPRL attacks. We conceptually analyse our method and experimentally evaluate it using multiple databases. Our results show that our method provides robust results for both high linkage quality and strong privacy protection.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102569"},"PeriodicalIF":3.0000,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Privacy-preserving record linkage using reference set based encoding: A single parameter method\",\"authors\":\"Sumayya Ziyad , Peter Christen , Anushka Vidanage , Charini Nanayakkara , Rainer Schnell\",\"doi\":\"10.1016/j.is.2025.102569\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Record linkage is the process of matching records that refer to the same entity across two or more databases. In many application areas, ranging from healthcare to government services, the databases to be linked contain sensitive personal information, and hence, cannot be shared across organisations. Privacy-Preserving Record Linkage (PPRL) aims to overcome this challenge by facilitating the comparison of records that have been encoded or encrypted, thereby allowing linkage without the need of sharing any sensitive data. While various PPRL techniques have been developed, most of them do not properly address privacy concerns, such as the various vulnerabilities of encoded data with regard to cryptanalysis attacks. Existing PPRL methods, furthermore, do not provide conceptual analyses of how a user should set the various parameters required, possibly leading to sub-optimal results with regard to both linkage quality and privacy protection. Here we present a <em>novel encoding method for PPRL that employs reference q-gram sets to generate bit arrays that represent sensitive values. Our method requires a single user parameter that determines a trade-off between linkage quality, scalability, and privacy.</em> All other parameters are either data driven or have strong bounds based on the user-set parameter. Furthermore, our method addresses the length, frequency, and pattern-based PPRL vulnerabilities that are exploited by existing PPRL attacks. We conceptually analyse our method and experimentally evaluate it using multiple databases. Our results show that our method provides robust results for both high linkage quality and strong privacy protection.</div></div>\",\"PeriodicalId\":50363,\"journal\":{\"name\":\"Information Systems\",\"volume\":\"133 \",\"pages\":\"Article 102569\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306437925000535\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437925000535","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Privacy-preserving record linkage using reference set based encoding: A single parameter method
Record linkage is the process of matching records that refer to the same entity across two or more databases. In many application areas, ranging from healthcare to government services, the databases to be linked contain sensitive personal information, and hence, cannot be shared across organisations. Privacy-Preserving Record Linkage (PPRL) aims to overcome this challenge by facilitating the comparison of records that have been encoded or encrypted, thereby allowing linkage without the need of sharing any sensitive data. While various PPRL techniques have been developed, most of them do not properly address privacy concerns, such as the various vulnerabilities of encoded data with regard to cryptanalysis attacks. Existing PPRL methods, furthermore, do not provide conceptual analyses of how a user should set the various parameters required, possibly leading to sub-optimal results with regard to both linkage quality and privacy protection. Here we present a novel encoding method for PPRL that employs reference q-gram sets to generate bit arrays that represent sensitive values. Our method requires a single user parameter that determines a trade-off between linkage quality, scalability, and privacy. All other parameters are either data driven or have strong bounds based on the user-set parameter. Furthermore, our method addresses the length, frequency, and pattern-based PPRL vulnerabilities that are exploited by existing PPRL attacks. We conceptually analyse our method and experimentally evaluate it using multiple databases. Our results show that our method provides robust results for both high linkage quality and strong privacy protection.
期刊介绍:
Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems.
Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.