A privacy-preserving LDA model training scheme based on federated learning

IF 6 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Hua Shen, Ying Cao, Bai Liu
{"title":"A privacy-preserving LDA model training scheme based on federated learning","authors":"Hua Shen,&nbsp;Ying Cao,&nbsp;Bai Liu","doi":"10.1016/j.iot.2025.101620","DOIUrl":null,"url":null,"abstract":"<div><div>Latent Dirichlet Allocation (LDA) is a widely used topic modeling technique that effectively extracts the distribution of topics and their associated words from various types of textual data. However, during the iterative training of an LDA model, there is a risk of leaking sensitive text information. Additionally, many current LDA training methods rely on centralized training patterns, which pose several challenges. In this manner, it can be difficult for the training node to process large volumes of text simultaneously. This setup also makes the node a single point of failure, a potential performance bottleneck, and a target for attackers. For these issues, this paper introduces an adaptive distributed training framework (FedLDA), combining federated learning and Collapsed Gibbs Sampling (CGS) for distributed datasets. Furthermore, we present a privacy-preserving LDA model training scheme (FedLDA-DP) that combines FedLDA and differential privacy technology. Analysis and experimental results demonstrate the effectiveness and efficiency of the proposed scheme.</div></div>","PeriodicalId":29968,"journal":{"name":"Internet of Things","volume":"32 ","pages":"Article 101620"},"PeriodicalIF":6.0000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet of Things","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2542660525001349","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Latent Dirichlet Allocation (LDA) is a widely used topic modeling technique that effectively extracts the distribution of topics and their associated words from various types of textual data. However, during the iterative training of an LDA model, there is a risk of leaking sensitive text information. Additionally, many current LDA training methods rely on centralized training patterns, which pose several challenges. In this manner, it can be difficult for the training node to process large volumes of text simultaneously. This setup also makes the node a single point of failure, a potential performance bottleneck, and a target for attackers. For these issues, this paper introduces an adaptive distributed training framework (FedLDA), combining federated learning and Collapsed Gibbs Sampling (CGS) for distributed datasets. Furthermore, we present a privacy-preserving LDA model training scheme (FedLDA-DP) that combines FedLDA and differential privacy technology. Analysis and experimental results demonstrate the effectiveness and efficiency of the proposed scheme.
一种基于联邦学习的隐私保护LDA模型训练方案
潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)是一种广泛应用的主题建模技术,它可以有效地从各种类型的文本数据中提取主题及其相关词的分布。然而,在LDA模型的迭代训练过程中,存在泄露敏感文本信息的风险。此外,许多当前的LDA训练方法依赖于集中的训练模式,这带来了一些挑战。在这种方式下,训练节点很难同时处理大量文本。这种设置还使节点成为单点故障、潜在的性能瓶颈和攻击者的目标。针对这些问题,本文引入了一种自适应分布式训练框架(FedLDA),该框架结合了分布式数据集的联邦学习和崩溃吉布斯抽样(CGS)。在此基础上,我们提出了一种结合FedLDA和差分隐私技术的隐私保护LDA模型训练方案(FedLDA- dp)。分析和实验结果证明了该方案的有效性和高效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Internet of Things
Internet of Things Multiple-
CiteScore
3.60
自引率
5.10%
发文量
115
审稿时长
37 days
期刊介绍: Internet of Things; Engineering Cyber Physical Human Systems is a comprehensive journal encouraging cross collaboration between researchers, engineers and practitioners in the field of IoT & Cyber Physical Human Systems. The journal offers a unique platform to exchange scientific information on the entire breadth of technology, science, and societal applications of the IoT. The journal will place a high priority on timely publication, and provide a home for high quality. Furthermore, IOT is interested in publishing topical Special Issues on any aspect of IOT.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信