{"title":"A privacy-preserving LDA model training scheme based on federated learning","authors":"Hua Shen, Ying Cao, Bai Liu","doi":"10.1016/j.iot.2025.101620","DOIUrl":null,"url":null,"abstract":"<div><div>Latent Dirichlet Allocation (LDA) is a widely used topic modeling technique that effectively extracts the distribution of topics and their associated words from various types of textual data. However, during the iterative training of an LDA model, there is a risk of leaking sensitive text information. Additionally, many current LDA training methods rely on centralized training patterns, which pose several challenges. In this manner, it can be difficult for the training node to process large volumes of text simultaneously. This setup also makes the node a single point of failure, a potential performance bottleneck, and a target for attackers. For these issues, this paper introduces an adaptive distributed training framework (FedLDA), combining federated learning and Collapsed Gibbs Sampling (CGS) for distributed datasets. Furthermore, we present a privacy-preserving LDA model training scheme (FedLDA-DP) that combines FedLDA and differential privacy technology. Analysis and experimental results demonstrate the effectiveness and efficiency of the proposed scheme.</div></div>","PeriodicalId":29968,"journal":{"name":"Internet of Things","volume":"32 ","pages":"Article 101620"},"PeriodicalIF":6.0000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet of Things","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2542660525001349","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Latent Dirichlet Allocation (LDA) is a widely used topic modeling technique that effectively extracts the distribution of topics and their associated words from various types of textual data. However, during the iterative training of an LDA model, there is a risk of leaking sensitive text information. Additionally, many current LDA training methods rely on centralized training patterns, which pose several challenges. In this manner, it can be difficult for the training node to process large volumes of text simultaneously. This setup also makes the node a single point of failure, a potential performance bottleneck, and a target for attackers. For these issues, this paper introduces an adaptive distributed training framework (FedLDA), combining federated learning and Collapsed Gibbs Sampling (CGS) for distributed datasets. Furthermore, we present a privacy-preserving LDA model training scheme (FedLDA-DP) that combines FedLDA and differential privacy technology. Analysis and experimental results demonstrate the effectiveness and efficiency of the proposed scheme.
期刊介绍:
Internet of Things; Engineering Cyber Physical Human Systems is a comprehensive journal encouraging cross collaboration between researchers, engineers and practitioners in the field of IoT & Cyber Physical Human Systems. The journal offers a unique platform to exchange scientific information on the entire breadth of technology, science, and societal applications of the IoT.
The journal will place a high priority on timely publication, and provide a home for high quality.
Furthermore, IOT is interested in publishing topical Special Issues on any aspect of IOT.