CNN-AutoMIC:结合卷积神经网络和自编码器学习非线性特征,用于基于knn的恶意软件图像分类

IF 4.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Simone Andriani , Stefano Galantucci , Andrea Iannacone , Antonio Maci , Giuseppe Pirlo
{"title":"CNN-AutoMIC:结合卷积神经网络和自编码器学习非线性特征,用于基于knn的恶意软件图像分类","authors":"Simone Andriani ,&nbsp;Stefano Galantucci ,&nbsp;Andrea Iannacone ,&nbsp;Antonio Maci ,&nbsp;Giuseppe Pirlo","doi":"10.1016/j.cose.2025.104507","DOIUrl":null,"url":null,"abstract":"<div><div>Malware refers to malicious software or a component of software intended for malicious purposes. The manual analysis and detection of malicious software is challenging due to its complexity. Thus, several automated solutions have become popular for real-time malware detection. A spread-out approach consists of generating images from the samples bytecode and giving them to convolutional neural networks (CNNs), which are used either as classifiers or feature extractors for further classification algorithms. These systems perform extremely well when trained and tested on partitions of the same dataset. However, cross-dataset tests and malware detection verification on emerging real-world samples are required in the real-world context. This is a crucial challenge when probing the robustness of the systems and models. This paper proposes CNN-AutoMIC,a robust automated approach to extract features from malware images. CNN-AutoMIC employs a specific CNN architecture to extract features, followed by an autoencoder-based compressor that reduces features to two fundamental components. The two-dimensional projection of these components is the basis of the predictions performed by the K-nearest neighbors (K-NN) algorithm. Moreover, the observable placement of new samples on the obtained scatter plot makes it possible to explain why the AI-based system produced a certain prediction. It was benchmarked against several CNN-based models and a Vision Transformer. They were trained on the Malevis dataset and cross-dataset evaluated on four different real-world datasets. CNN-AutoMIC outperformed the competitors for each classification performance metric, while requiring a reasonable training and prediction time. In addition, it achieves a promising Akaike information criterion (AIC) score, indicating its efficiency in terms of model complexity.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"156 ","pages":"Article 104507"},"PeriodicalIF":4.8000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CNN-AutoMIC: Combining convolutional neural network and autoencoder to learn non-linear features for KNN-based malware image classification\",\"authors\":\"Simone Andriani ,&nbsp;Stefano Galantucci ,&nbsp;Andrea Iannacone ,&nbsp;Antonio Maci ,&nbsp;Giuseppe Pirlo\",\"doi\":\"10.1016/j.cose.2025.104507\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Malware refers to malicious software or a component of software intended for malicious purposes. The manual analysis and detection of malicious software is challenging due to its complexity. Thus, several automated solutions have become popular for real-time malware detection. A spread-out approach consists of generating images from the samples bytecode and giving them to convolutional neural networks (CNNs), which are used either as classifiers or feature extractors for further classification algorithms. These systems perform extremely well when trained and tested on partitions of the same dataset. However, cross-dataset tests and malware detection verification on emerging real-world samples are required in the real-world context. This is a crucial challenge when probing the robustness of the systems and models. This paper proposes CNN-AutoMIC,a robust automated approach to extract features from malware images. CNN-AutoMIC employs a specific CNN architecture to extract features, followed by an autoencoder-based compressor that reduces features to two fundamental components. The two-dimensional projection of these components is the basis of the predictions performed by the K-nearest neighbors (K-NN) algorithm. Moreover, the observable placement of new samples on the obtained scatter plot makes it possible to explain why the AI-based system produced a certain prediction. It was benchmarked against several CNN-based models and a Vision Transformer. They were trained on the Malevis dataset and cross-dataset evaluated on four different real-world datasets. CNN-AutoMIC outperformed the competitors for each classification performance metric, while requiring a reasonable training and prediction time. In addition, it achieves a promising Akaike information criterion (AIC) score, indicating its efficiency in terms of model complexity.</div></div>\",\"PeriodicalId\":51004,\"journal\":{\"name\":\"Computers & Security\",\"volume\":\"156 \",\"pages\":\"Article 104507\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167404825001968\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404825001968","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

恶意软件是指恶意软件或用于恶意目的的软件组件。由于恶意软件的复杂性,人工分析和检测是具有挑战性的。因此,一些自动化的解决方案已经成为流行的实时恶意软件检测。一种扩展方法包括从样本字节码生成图像并将其提供给卷积神经网络(cnn),卷积神经网络用作分类器或特征提取器,用于进一步的分类算法。当在相同数据集的分区上进行训练和测试时,这些系统表现得非常好。然而,在现实世界中,需要对新出现的真实世界样本进行跨数据集测试和恶意软件检测验证。在探索系统和模型的健壮性时,这是一个至关重要的挑战。本文提出了CNN-AutoMIC,一种从恶意软件图像中提取特征的鲁棒自动化方法。CNN- automic采用特定的CNN架构来提取特征,然后是基于自编码器的压缩器,将特征减少到两个基本组件。这些分量的二维投影是k近邻(K-NN)算法进行预测的基础。此外,新样本在获得的散点图上的可观察位置使得可以解释为什么基于ai的系统产生了一定的预测。它与几个基于cnn的模型和一个Vision Transformer进行了基准测试。他们在Malevis数据集上进行了训练,并在四个不同的现实世界数据集上进行了交叉数据集评估。CNN-AutoMIC在每个分类性能指标上都优于竞争对手,同时需要合理的训练和预测时间。此外,该方法还取得了不错的赤池信息准则(Akaike information criterion, AIC)分数,表明其在模型复杂度方面的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
CNN-AutoMIC: Combining convolutional neural network and autoencoder to learn non-linear features for KNN-based malware image classification
Malware refers to malicious software or a component of software intended for malicious purposes. The manual analysis and detection of malicious software is challenging due to its complexity. Thus, several automated solutions have become popular for real-time malware detection. A spread-out approach consists of generating images from the samples bytecode and giving them to convolutional neural networks (CNNs), which are used either as classifiers or feature extractors for further classification algorithms. These systems perform extremely well when trained and tested on partitions of the same dataset. However, cross-dataset tests and malware detection verification on emerging real-world samples are required in the real-world context. This is a crucial challenge when probing the robustness of the systems and models. This paper proposes CNN-AutoMIC,a robust automated approach to extract features from malware images. CNN-AutoMIC employs a specific CNN architecture to extract features, followed by an autoencoder-based compressor that reduces features to two fundamental components. The two-dimensional projection of these components is the basis of the predictions performed by the K-nearest neighbors (K-NN) algorithm. Moreover, the observable placement of new samples on the obtained scatter plot makes it possible to explain why the AI-based system produced a certain prediction. It was benchmarked against several CNN-based models and a Vision Transformer. They were trained on the Malevis dataset and cross-dataset evaluated on four different real-world datasets. CNN-AutoMIC outperformed the competitors for each classification performance metric, while requiring a reasonable training and prediction time. In addition, it achieves a promising Akaike information criterion (AIC) score, indicating its efficiency in terms of model complexity.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers & Security
Computers & Security 工程技术-计算机:信息系统
CiteScore
12.40
自引率
7.10%
发文量
365
审稿时长
10.7 months
期刊介绍: Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信