在非iid数据的联邦学习中使用多任务自动编码器进行样本选择

IF 5.1 2区工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY

Engineering Science and Technology-An International Journal-Jestech Pub Date : 2025-01-01 DOI:10.1016/j.jestch.2024.101920

Emre Ardıç, Yakup Genç

{"title":"在非iid数据的联邦学习中使用多任务自动编码器进行样本选择","authors":"Emre Ardıç, Yakup Genç","doi":"10.1016/j.jestch.2024.101920","DOIUrl":null,"url":null,"abstract":"<div><div>Federated learning is a machine learning paradigm in which multiple devices collaboratively train a model under the supervision of a central server while ensuring data privacy. However, its performance is often hindered by redundant, malicious, or abnormal samples, leading to model degradation and inefficiency. To overcome these issues, we propose novel sample selection methods for image classification, employing a multi-task autoencoder to estimate sample contributions through loss and feature analysis. Our approach incorporates unsupervised outlier detection, using one-class support vector machine (OCSVM), isolation forest (IF), and adaptive loss threshold (AT) methods managed by a central server to filter noisy samples on clients. We also propose a multi-class deep support vector data description (SVDD) loss controlled by a central server to enhance feature-based sample selection. We validate our methods on CIFAR10 and MNIST datasets across varying numbers of clients, non-IID distributions, and noise levels up to 40%. The results show significant accuracy improvements with loss-based sample selection, achieving gains of up to 7.02% on CIFAR10 with OCSVM and 1.83% on MNIST with AT. Additionally, our federated SVDD loss further improves feature-based sample selection, yielding accuracy gains of up to 0.99% on CIFAR10 with OCSVM. These results show the effectiveness of our methods in improving model accuracy across various client counts and noise conditions.</div></div>","PeriodicalId":48609,"journal":{"name":"Engineering Science and Technology-An International Journal-Jestech","volume":"61 ","pages":"Article 101920"},"PeriodicalIF":5.1000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sample selection using multi-task autoencoders in federated learning with non-IID data\",\"authors\":\"Emre Ardıç, Yakup Genç\",\"doi\":\"10.1016/j.jestch.2024.101920\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Federated learning is a machine learning paradigm in which multiple devices collaboratively train a model under the supervision of a central server while ensuring data privacy. However, its performance is often hindered by redundant, malicious, or abnormal samples, leading to model degradation and inefficiency. To overcome these issues, we propose novel sample selection methods for image classification, employing a multi-task autoencoder to estimate sample contributions through loss and feature analysis. Our approach incorporates unsupervised outlier detection, using one-class support vector machine (OCSVM), isolation forest (IF), and adaptive loss threshold (AT) methods managed by a central server to filter noisy samples on clients. We also propose a multi-class deep support vector data description (SVDD) loss controlled by a central server to enhance feature-based sample selection. We validate our methods on CIFAR10 and MNIST datasets across varying numbers of clients, non-IID distributions, and noise levels up to 40%. The results show significant accuracy improvements with loss-based sample selection, achieving gains of up to 7.02% on CIFAR10 with OCSVM and 1.83% on MNIST with AT. Additionally, our federated SVDD loss further improves feature-based sample selection, yielding accuracy gains of up to 0.99% on CIFAR10 with OCSVM. These results show the effectiveness of our methods in improving model accuracy across various client counts and noise conditions.</div></div>\",\"PeriodicalId\":48609,\"journal\":{\"name\":\"Engineering Science and Technology-An International Journal-Jestech\",\"volume\":\"61 \",\"pages\":\"Article 101920\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Science and Technology-An International Journal-Jestech\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2215098624003069\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Science and Technology-An International Journal-Jestech","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215098624003069","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

联邦学习是一种机器学习范例，其中多个设备在中央服务器的监督下协同训练模型，同时确保数据隐私。然而，它的性能经常受到冗余、恶意或异常样本的阻碍，导致模型退化和效率低下。为了克服这些问题，我们提出了新的图像分类样本选择方法，采用多任务自动编码器通过损失和特征分析来估计样本贡献。我们的方法结合了无监督异常值检测，使用一类支持向量机（OCSVM）、隔离森林（IF）和自适应损失阈值（AT）方法，由中央服务器管理，以过滤客户端上的噪声样本。我们还提出了一种由中央服务器控制的多类深度支持向量数据描述（SVDD）损失，以增强基于特征的样本选择。我们在CIFAR10和MNIST数据集上验证了我们的方法，这些数据集跨越不同数量的客户端、非iid分布和高达40%的噪声水平。结果表明，基于损失的样本选择显著提高了准确率，OCSVM在CIFAR10上的增益高达7.02%，AT在MNIST上的增益高达1.83%。此外，我们的联合SVDD损失进一步改进了基于特征的样本选择，在使用OCSVM的CIFAR10上获得了高达0.99%的准确率增益。这些结果表明，我们的方法在提高不同客户数量和噪声条件下的模型精度方面是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sample selection using multi-task autoencoders in federated learning with non-IID data

Federated learning is a machine learning paradigm in which multiple devices collaboratively train a model under the supervision of a central server while ensuring data privacy. However, its performance is often hindered by redundant, malicious, or abnormal samples, leading to model degradation and inefficiency. To overcome these issues, we propose novel sample selection methods for image classification, employing a multi-task autoencoder to estimate sample contributions through loss and feature analysis. Our approach incorporates unsupervised outlier detection, using one-class support vector machine (OCSVM), isolation forest (IF), and adaptive loss threshold (AT) methods managed by a central server to filter noisy samples on clients. We also propose a multi-class deep support vector data description (SVDD) loss controlled by a central server to enhance feature-based sample selection. We validate our methods on CIFAR10 and MNIST datasets across varying numbers of clients, non-IID distributions, and noise levels up to 40%. The results show significant accuracy improvements with loss-based sample selection, achieving gains of up to 7.02% on CIFAR10 with OCSVM and 1.83% on MNIST with AT. Additionally, our federated SVDD loss further improves feature-based sample selection, yielding accuracy gains of up to 0.99% on CIFAR10 with OCSVM. These results show the effectiveness of our methods in improving model accuracy across various client counts and noise conditions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Engineering Science and Technology-An International Journal-Jestech Materials Science-Electronic, Optical and Magnetic Materials

CiteScore

11.20

自引率

3.50%

发文量

153

审稿时长

22 days

期刊介绍： Engineering Science and Technology, an International Journal (JESTECH) (formerly Technology), a peer-reviewed quarterly engineering journal, publishes both theoretical and experimental high quality papers of permanent interest, not previously published in journals, in the field of engineering and applied science which aims to promote the theory and practice of technology and engineering. In addition to peer-reviewed original research papers, the Editorial Board welcomes original research reports, state-of-the-art reviews and communications in the broadly defined field of engineering science and technology. The scope of JESTECH includes a wide spectrum of subjects including: -Electrical/Electronics and Computer Engineering (Biomedical Engineering and Instrumentation; Coding, Cryptography, and Information Protection; Communications, Networks, Mobile Computing and Distributed Systems; Compilers and Operating Systems; Computer Architecture, Parallel Processing, and Dependability; Computer Vision and Robotics; Control Theory; Electromagnetic Waves, Microwave Techniques and Antennas; Embedded Systems; Integrated Circuits, VLSI Design, Testing, and CAD; Microelectromechanical Systems; Microelectronics, and Electronic Devices and Circuits; Power, Energy and Energy Conversion Systems; Signal, Image, and Speech Processing) -Mechanical and Civil Engineering (Automotive Technologies; Biomechanics; Construction Materials; Design and Manufacturing; Dynamics and Control; Energy Generation, Utilization, Conversion, and Storage; Fluid Mechanics and Hydraulics; Heat and Mass Transfer; Micro-Nano Sciences; Renewable and Sustainable Energy Technologies; Robotics and Mechatronics; Solid Mechanics and Structure; Thermal Sciences) -Metallurgical and Materials Engineering (Advanced Materials Science; Biomaterials; Ceramic and Inorgnanic Materials; Electronic-Magnetic Materials; Energy and Environment; Materials Characterizastion; Metallurgy; Polymers and Nanocomposites)