On Deceiving Malware Classification with Section Injection

IF 6 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine learning and knowledge extraction Pub Date : 2023-01-16 DOI:10.3390/make5010009

Adeilson Antonio da Silva, Maurício Pamplona Segundo

{"title":"On Deceiving Malware Classification with Section Injection","authors":"Adeilson Antonio da Silva, Maurício Pamplona Segundo","doi":"10.3390/make5010009","DOIUrl":null,"url":null,"abstract":"We investigate how to modify executable files to deceive malware classification systems. This work’s main contribution is a methodology to inject bytes across a malware file randomly and use it both as an attack to decrease classification accuracy but also as a defensive method, augmenting the data available for training. It respects the operating system file format to make sure the malware will still execute after our injection and will not change its behavior. We reproduced five state-of-the-art malware classification approaches to evaluate our injection scheme: one based on Global Image Descriptor (GIST) + K-Nearest-Neighbors (KNN), three Convolutional Neural Network (CNN) variations and one Gated CNN. We performed our experiments on a public dataset with 9339 malware samples from 25 different families. Our results show that a mere increase of 7% in the malware size causes an accuracy drop between 25% and 40% for malware family classification. They show that an automatic malware classification system may not be as trustworthy as initially reported in the literature. We also evaluate using modified malware alongside the original ones to increase networks robustness against the mentioned attacks. The results show that a combination of reordering malware sections and injecting random data can improve the overall performance of the classification. All the code is publicly available.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":" ","pages":""},"PeriodicalIF":6.0000,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning and knowledge extraction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/make5010009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 1

Abstract

We investigate how to modify executable files to deceive malware classification systems. This work’s main contribution is a methodology to inject bytes across a malware file randomly and use it both as an attack to decrease classification accuracy but also as a defensive method, augmenting the data available for training. It respects the operating system file format to make sure the malware will still execute after our injection and will not change its behavior. We reproduced five state-of-the-art malware classification approaches to evaluate our injection scheme: one based on Global Image Descriptor (GIST) + K-Nearest-Neighbors (KNN), three Convolutional Neural Network (CNN) variations and one Gated CNN. We performed our experiments on a public dataset with 9339 malware samples from 25 different families. Our results show that a mere increase of 7% in the malware size causes an accuracy drop between 25% and 40% for malware family classification. They show that an automatic malware classification system may not be as trustworthy as initially reported in the literature. We also evaluate using modified malware alongside the original ones to increase networks robustness against the mentioned attacks. The results show that a combination of reordering malware sections and injecting random data can improve the overall performance of the classification. All the code is publicly available.

查看原文本刊更多论文

利用分段注入欺骗恶意软件分类

我们研究如何修改可执行文件来欺骗恶意软件分类系统。这项工作的主要贡献是一种方法，可以在恶意软件文件中随机注入字节，并将其用作降低分类准确性的攻击，也可以用作防御方法，增加可用于训练的数据。它尊重操作系统文件格式，以确保恶意软件在我们注入后仍将执行，并且不会改变其行为。我们复制了五种最先进的恶意软件分类方法来评估我们的注入方案：一种基于全局图像描述符（GIST）+K-最近邻居（KNN），三种卷积神经网络（CNN）变体和一种门控CNN。我们在一个公共数据集上进行了实验，该数据集包含来自25个不同家族的9339个恶意软件样本。我们的结果表明，恶意软件大小仅增加7%，就会导致恶意软件家族分类的准确率下降25%至40%。他们表明，自动恶意软件分类系统可能不像文献中最初报道的那样值得信赖。我们还评估了在使用原始恶意软件的同时使用修改后的恶意软件，以提高网络对上述攻击的稳健性。结果表明，重新排序恶意软件部分和注入随机数据的组合可以提高分类的整体性能。所有代码都是公开的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊