FASNet: Federated adversarial Siamese networks for robust malware image classification

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing Pub Date : 2025-01-16 DOI:10.1016/j.jpdc.2025.105039

Namrata Govind Ambekar , Sonali Samal , N. Nandini Devi , Surmila Thokchom

{"title":"FASNet: Federated adversarial Siamese networks for robust malware image classification","authors":"Namrata Govind Ambekar , Sonali Samal , N. Nandini Devi , Surmila Thokchom","doi":"10.1016/j.jpdc.2025.105039","DOIUrl":null,"url":null,"abstract":"<div><div>Malware detection faces considerable challenges due to the ever-evolving and complex nature of cyber threats. Various deep learning models have demonstrated effectiveness in identifying malware within organizations. However, developing a reliable distributed malware detection model using diverse data from multiple sources faces significant challenges, which are worsened by privacy concerns, including data distribution issues and the absence of balanced datasets. This requires advanced data privacy techniques. To address this, the proposed FASNet approach makes the following key contributions: This study introduces FASNet, a novel privacy-centric distributed malware detection model designed to enhance detection accuracy and robustness. FASNet employs state-of-the-art Siamese networks as feature extractors and incorporates two significant advancements: federated learning and adversarial training. Federated learning, implemented with a client size of three, ensures that model training is conducted on individual devices, eliminating the need for centralized data collection and addressing data privacy concerns. This design also prevents data dilution and communication overhead while maintaining effective training on each device. Additionally, adversarial training utilizing the Fast Gradient Sign Method (FGSM) generates adversarial images to strengthen the model's resilience. By training on both original and adversarial malware images, FASNet improves its ability to accurately classify malware images that have been intentionally perturbed to mislead the system. Experimental results on the Blended dataset demonstrate the efficacy of the proposed FASNet approach, achieving notable performance with a testing accuracy of 0.9510, precision of 0.9417, recall of 0.9510, f1 score of 0.9384, Matthews Correlation Coefficient (MCC) of 0.9464, Jaccard Index (JI) of 0.9271 and Fowlkes-Mallows Index (FMI) of 0.9725. These experimental findings show that the proposed FASNet method effectively tackles two main challenges: privacy-centric malware detection and an imbalanced dataset.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"198 ","pages":"Article 105039"},"PeriodicalIF":3.4000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Parallel and Distributed Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0743731525000061","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Malware detection faces considerable challenges due to the ever-evolving and complex nature of cyber threats. Various deep learning models have demonstrated effectiveness in identifying malware within organizations. However, developing a reliable distributed malware detection model using diverse data from multiple sources faces significant challenges, which are worsened by privacy concerns, including data distribution issues and the absence of balanced datasets. This requires advanced data privacy techniques. To address this, the proposed FASNet approach makes the following key contributions: This study introduces FASNet, a novel privacy-centric distributed malware detection model designed to enhance detection accuracy and robustness. FASNet employs state-of-the-art Siamese networks as feature extractors and incorporates two significant advancements: federated learning and adversarial training. Federated learning, implemented with a client size of three, ensures that model training is conducted on individual devices, eliminating the need for centralized data collection and addressing data privacy concerns. This design also prevents data dilution and communication overhead while maintaining effective training on each device. Additionally, adversarial training utilizing the Fast Gradient Sign Method (FGSM) generates adversarial images to strengthen the model's resilience. By training on both original and adversarial malware images, FASNet improves its ability to accurately classify malware images that have been intentionally perturbed to mislead the system. Experimental results on the Blended dataset demonstrate the efficacy of the proposed FASNet approach, achieving notable performance with a testing accuracy of 0.9510, precision of 0.9417, recall of 0.9510, f1 score of 0.9384, Matthews Correlation Coefficient (MCC) of 0.9464, Jaccard Index (JI) of 0.9271 and Fowlkes-Mallows Index (FMI) of 0.9725. These experimental findings show that the proposed FASNet method effectively tackles two main challenges: privacy-centric malware detection and an imbalanced dataset.

查看原文本刊更多论文

FASNet：用于稳健恶意软件图像分类的联邦对抗性暹罗网络

由于网络威胁的不断发展和复杂性，恶意软件检测面临着相当大的挑战。各种深度学习模型已经证明了在识别组织内恶意软件方面的有效性。然而，使用来自多个来源的不同数据开发可靠的分布式恶意软件检测模型面临着重大挑战，隐私问题（包括数据分布问题和缺乏平衡的数据集）加剧了这一挑战。这需要先进的数据隐私技术。为了解决这个问题，提出的FASNet方法做出了以下关键贡献：本研究引入了FASNet，一种新的以隐私为中心的分布式恶意软件检测模型，旨在提高检测的准确性和鲁棒性。FASNet采用最先进的暹罗网络作为特征提取器，并结合了两个重要的进步：联邦学习和对抗训练。联邦学习在客户端大小为3的情况下实现，确保在单个设备上进行模型训练，从而消除了集中数据收集和解决数据隐私问题的需要。这种设计还可以防止数据稀释和通信开销，同时在每个设备上保持有效的训练。此外，利用快速梯度符号方法（FGSM）的对抗训练生成对抗图像，以增强模型的弹性。通过对原始和对抗恶意软件图像进行训练，FASNet提高了其准确分类恶意软件图像的能力，这些恶意软件图像被故意扰乱以误导系统。在混合数据集上的实验结果表明了FASNet方法的有效性，测试准确率为0.9510，精密度为0.9417，召回率为0.9510，f1分数为0.9384，Matthews相关系数（MCC）为0.9464，Jaccard指数（JI）为0.9271，Fowlkes-Mallows指数（FMI）为0.9725，取得了显著的效果。实验结果表明，提出的FASNet方法有效地解决了两个主要挑战：以隐私为中心的恶意软件检测和不平衡数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Parallel and Distributed Computing 工程技术-计算机：理论方法

CiteScore

10.30

自引率

2.60%

发文量

172

审稿时长

12 months

期刊介绍： This international journal is directed to researchers, engineers, educators, managers, programmers, and users of computers who have particular interests in parallel processing and/or distributed computing. The Journal of Parallel and Distributed Computing publishes original research papers and timely review articles on the theory, design, evaluation, and use of parallel and/or distributed computing systems. The journal also features special issues on these topics; again covering the full range from the design to the use of our targeted systems.