基于联合学习的自然语言处理：系统文献综述

IF 10.7 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review Pub Date : 2024-10-12 DOI:10.1007/s10462-024-10970-5

Younas Khan, David Sánchez, Josep Domingo-Ferrer

{"title":"基于联合学习的自然语言处理：系统文献综述","authors":"Younas Khan, David Sánchez, Josep Domingo-Ferrer","doi":"10.1007/s10462-024-10970-5","DOIUrl":null,"url":null,"abstract":"<div><p>Federated learning (FL) is a decentralized machine learning (ML) framework that allows models to be trained without sharing the participants’ local data. FL thus preserves privacy better than centralized machine learning. Since textual data (such as clinical records, posts in social networks, or search queries) often contain personal information, many natural language processing (NLP) tasks dealing with such data have shifted from the centralized to the FL setting. However, FL is not free from issues, including convergence and security vulnerabilities (due to unreliable or poisoned data introduced into the model), communication and computation bottlenecks, and even privacy attacks orchestrated by honest-but-curious servers. In this paper, we present a systematic literature review (SLR) of NLP applications in FL with a special focus on FL issues and the solutions proposed so far. Our review surveys 36 recent papers published in relevant venues, which are systematically analyzed and compared from multiple perspectives. As a result of the survey, we also identify the most outstanding challenges in the area.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"57 12","pages":""},"PeriodicalIF":10.7000,"publicationDate":"2024-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-10970-5.pdf","citationCount":"0","resultStr":"{\"title\":\"Federated learning-based natural language processing: a systematic literature review\",\"authors\":\"Younas Khan, David Sánchez, Josep Domingo-Ferrer\",\"doi\":\"10.1007/s10462-024-10970-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Federated learning (FL) is a decentralized machine learning (ML) framework that allows models to be trained without sharing the participants’ local data. FL thus preserves privacy better than centralized machine learning. Since textual data (such as clinical records, posts in social networks, or search queries) often contain personal information, many natural language processing (NLP) tasks dealing with such data have shifted from the centralized to the FL setting. However, FL is not free from issues, including convergence and security vulnerabilities (due to unreliable or poisoned data introduced into the model), communication and computation bottlenecks, and even privacy attacks orchestrated by honest-but-curious servers. In this paper, we present a systematic literature review (SLR) of NLP applications in FL with a special focus on FL issues and the solutions proposed so far. Our review surveys 36 recent papers published in relevant venues, which are systematically analyzed and compared from multiple perspectives. As a result of the survey, we also identify the most outstanding challenges in the area.</p></div>\",\"PeriodicalId\":8449,\"journal\":{\"name\":\"Artificial Intelligence Review\",\"volume\":\"57 12\",\"pages\":\"\"},\"PeriodicalIF\":10.7000,\"publicationDate\":\"2024-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s10462-024-10970-5.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence Review\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10462-024-10970-5\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-024-10970-5","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

联合学习（FL）是一种去中心化的机器学习（ML）框架，它允许在不共享参与者本地数据的情况下训练模型。因此，FL 比集中式机器学习更能保护隐私。由于文本数据（如临床记录、社交网络中的帖子或搜索查询）通常包含个人信息，许多处理此类数据的自然语言处理（NLP）任务已从集中式转为分散式。然而，FL 并非没有问题，包括收敛性和安全漏洞（由于模型中引入了不可靠或有毒数据）、通信和计算瓶颈，甚至由诚实但好奇的服务器策划的隐私攻击。在本文中，我们对 FL 中的 NLP 应用进行了系统的文献综述（SLR），特别关注 FL 问题和迄今为止提出的解决方案。我们的综述调查了最近在相关刊物上发表的 36 篇论文，并从多个角度对这些论文进行了系统分析和比较。通过调查，我们还确定了该领域最突出的挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Federated learning-based natural language processing: a systematic literature review

Federated learning (FL) is a decentralized machine learning (ML) framework that allows models to be trained without sharing the participants’ local data. FL thus preserves privacy better than centralized machine learning. Since textual data (such as clinical records, posts in social networks, or search queries) often contain personal information, many natural language processing (NLP) tasks dealing with such data have shifted from the centralized to the FL setting. However, FL is not free from issues, including convergence and security vulnerabilities (due to unreliable or poisoned data introduced into the model), communication and computation bottlenecks, and even privacy attacks orchestrated by honest-but-curious servers. In this paper, we present a systematic literature review (SLR) of NLP applications in FL with a special focus on FL issues and the solutions proposed so far. Our review surveys 36 recent papers published in relevant venues, which are systematically analyzed and compared from multiple perspectives. As a result of the survey, we also identify the most outstanding challenges in the area.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Artificial Intelligence Review 工程技术-计算机：人工智能

CiteScore

22.00

自引率

3.30%

发文量

194

审稿时长

5.3 months

期刊介绍： Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.