Third-party private set intersection with application to privacy-preserving training of large language models

IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Zhenhua Liu , Han Liang , Jinhua Wang , Baocang Wang
{"title":"Third-party private set intersection with application to privacy-preserving training of large language models","authors":"Zhenhua Liu ,&nbsp;Han Liang ,&nbsp;Jinhua Wang ,&nbsp;Baocang Wang","doi":"10.1016/j.jisa.2025.104061","DOIUrl":null,"url":null,"abstract":"<div><div>In the training of large language models (LLMs), the protection of private dataset is especially crucial. The private set intersection (PSI) mechanism acts as a potent privacy-preserving collaborative learning technique, allowing participants to collaborate in model training without revealing their own data, and thereby meeting the training requirements of LLMs. In this paper, we consider a variant of PSI, namely third-party PSI, where a third-party with no input privately receives the intersection of the other two parties’ sets, while the two parties output nothing. We propose a general construction of third-party PSI protocol from leveled fully homomorphic encryption, which ensures privacy-preserving training of large language models. The proposed construction can support intersection of arbitrary-length items by using polynomial links, and its security can be proven in the presence of semi-honest adversaries. Compared with existing protocols, the instantiation of the proposed general construction achieves higher computational efficiency while maintaining equivalent level of communication complexity. More importantly, the proposed protocol offers better utility, effectively safeguarding the privacy of the data without compromising model accuracy.</div></div>","PeriodicalId":48638,"journal":{"name":"Journal of Information Security and Applications","volume":"91 ","pages":"Article 104061"},"PeriodicalIF":3.8000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Security and Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214212625000985","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

In the training of large language models (LLMs), the protection of private dataset is especially crucial. The private set intersection (PSI) mechanism acts as a potent privacy-preserving collaborative learning technique, allowing participants to collaborate in model training without revealing their own data, and thereby meeting the training requirements of LLMs. In this paper, we consider a variant of PSI, namely third-party PSI, where a third-party with no input privately receives the intersection of the other two parties’ sets, while the two parties output nothing. We propose a general construction of third-party PSI protocol from leveled fully homomorphic encryption, which ensures privacy-preserving training of large language models. The proposed construction can support intersection of arbitrary-length items by using polynomial links, and its security can be proven in the presence of semi-honest adversaries. Compared with existing protocols, the instantiation of the proposed general construction achieves higher computational efficiency while maintaining equivalent level of communication complexity. More importantly, the proposed protocol offers better utility, effectively safeguarding the privacy of the data without compromising model accuracy.

Abstract Image

第三方私有集交叉应用于大型语言模型的隐私保护训练
在大型语言模型(llm)的训练中,私有数据集的保护尤为重要。私有集交集(PSI)机制作为一种有效的保护隐私的协作学习技术,允许参与者在不泄露自己数据的情况下协作进行模型训练,从而满足法学硕士的训练要求。在本文中,我们考虑PSI的一种变体,即第三方PSI,其中没有输入的第三方私下接收其他两方集合的交集,而两方没有输出。我们提出了一种基于分层全同态加密的第三方PSI协议的通用构造,保证了大型语言模型的隐私保护训练。所提出的构造可以通过多项式链路支持任意长度项的相交,并且可以在半诚实对手存在的情况下证明其安全性。与现有协议相比,所提出的通用结构的实例化在保持等效通信复杂度的情况下实现了更高的计算效率。更重要的是,该协议提供了更好的实用性,在不影响模型准确性的情况下有效地保护了数据的隐私。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Information Security and Applications
Journal of Information Security and Applications Computer Science-Computer Networks and Communications
CiteScore
10.90
自引率
5.40%
发文量
206
审稿时长
56 days
期刊介绍: Journal of Information Security and Applications (JISA) focuses on the original research and practice-driven applications with relevance to information security and applications. JISA provides a common linkage between a vibrant scientific and research community and industry professionals by offering a clear view on modern problems and challenges in information security, as well as identifying promising scientific and "best-practice" solutions. JISA issues offer a balance between original research work and innovative industrial approaches by internationally renowned information security experts and researchers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信