Privacy-Preserving Federated Learning for Phishing Detection

IF 1.9 4区工程技术 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Technology and Society Magazine Pub Date : 2025-03-08 DOI:10.1109/MTS.2025.3558971

Amr I. Elkhawas;Thomas M. Chen;Ilir Gashi

{"title":"Privacy-Preserving Federated Learning for Phishing Detection","authors":"Amr I. Elkhawas;Thomas M. Chen;Ilir Gashi","doi":"10.1109/MTS.2025.3558971","DOIUrl":null,"url":null,"abstract":"Machine learning is one of the most prominent technologies used to combat phishing detection; however, the vast amount of data required for training models for detection raises a privacy concern for end users. Gathering email or document data may very well contain private information, and the machine learning models learn from the words and other attributes of these text-based documents. Gathering this information in a centralized location and using it to train models could pose a security risk on all levels of data acquisition, from the transfer of the data to the storage. Federated learning is emerging as a promising alternative to traditionally centralized machine learning for phishing detection. The advantages of federated learning, mainly in privacy and scalability, are weighed against the issue of detection accuracy. Federated learning provides the ability to train models without the transfer of sensitive data, more or less no raw data from the device, and allows the training to be done locally; this eliminates the privacy exposure accompanied by traditional machine learning models that operate in a centralized manner. However, this alone is not enough to comply with privacy regulations, such as General Data Protection Regulation (GDPR) and the European Union (EU) Artificial Intelligence Act (AI Act), and privacy-preserving technology must be used in conjunction to ensure federated learning’s compliance with privacy regulations. This article is a dedication to Professor Thomas Chen’s aspirations in the field of cybersecurity. This article is dedicated to his memory.","PeriodicalId":55016,"journal":{"name":"IEEE Technology and Society Magazine","volume":"44 2","pages":"77-84"},"PeriodicalIF":1.9000,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Technology and Society Magazine","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10991968/","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Machine learning is one of the most prominent technologies used to combat phishing detection; however, the vast amount of data required for training models for detection raises a privacy concern for end users. Gathering email or document data may very well contain private information, and the machine learning models learn from the words and other attributes of these text-based documents. Gathering this information in a centralized location and using it to train models could pose a security risk on all levels of data acquisition, from the transfer of the data to the storage. Federated learning is emerging as a promising alternative to traditionally centralized machine learning for phishing detection. The advantages of federated learning, mainly in privacy and scalability, are weighed against the issue of detection accuracy. Federated learning provides the ability to train models without the transfer of sensitive data, more or less no raw data from the device, and allows the training to be done locally; this eliminates the privacy exposure accompanied by traditional machine learning models that operate in a centralized manner. However, this alone is not enough to comply with privacy regulations, such as General Data Protection Regulation (GDPR) and the European Union (EU) Artificial Intelligence Act (AI Act), and privacy-preserving technology must be used in conjunction to ensure federated learning’s compliance with privacy regulations. This article is a dedication to Professor Thomas Chen’s aspirations in the field of cybersecurity. This article is dedicated to his memory.

查看原文本刊更多论文

保护隐私的网络钓鱼检测联合学习

机器学习是用于打击网络钓鱼检测的最突出技术之一；然而，训练检测模型所需的大量数据引起了最终用户对隐私的担忧。收集电子邮件或文档数据很可能包含私人信息，机器学习模型从这些基于文本的文档的单词和其他属性中学习。在集中位置收集这些信息并使用它来训练模型可能会对从数据传输到存储的所有数据采集级别构成安全风险。联邦学习正在成为传统集中式机器学习的网络钓鱼检测的一个有前途的替代方案。联邦学习的优势（主要是在隐私和可扩展性方面）与检测准确性的问题进行了权衡。联邦学习提供了在不传输敏感数据的情况下训练模型的能力，或多或少没有来自设备的原始数据，并允许在本地完成训练；这消除了以集中方式运行的传统机器学习模型所带来的隐私暴露。然而，仅凭这一点还不足以遵守隐私法规，例如通用数据保护法规（GDPR）和欧盟人工智能法案（AI法案），并且必须结合使用隐私保护技术来确保联邦学习符合隐私法规。这篇文章是对Thomas Chen教授在网络安全领域的愿望的致敬。这篇文章谨以纪念他。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Technology and Society Magazine 工程技术-工程：电子与电气

CiteScore

3.00

自引率

13.60%

发文量

审稿时长

>12 weeks

期刊介绍： IEEE Technology and Society Magazine invites feature articles (refereed), special articles, and commentaries on topics within the scope of the IEEE Society on Social Implications of Technology, in the broad areas of social implications of electrotechnology, history of electrotechnology, and engineering ethics.