SATTVA:基于SpArsiTy的恶意软件变体分类

Proceedings of the 3rd ACM Workshop on Information Hiding and Multimedia Security Pub Date : 2015-06-17 DOI:10.1145/2756601.2756616

L. Nataraj, S. Karthikeyan, B. S. Manjunath

{"title":"SATTVA:基于SpArsiTy的恶意软件变体分类","authors":"L. Nataraj, S. Karthikeyan, B. S. Manjunath","doi":"10.1145/2756601.2756616","DOIUrl":null,"url":null,"abstract":"There is an alarming increase in the amount of malware that is generated today. However, several studies have shown that most of these new malware are just variants of existing ones. Fast detection of these variants plays an effective role in thwarting new attacks. In this paper, we propose a novel approach to detect malware variants using a sparse representation framework. Exploiting the fact that most malware variants have small differences in their structure, we model a new/unknown malware sample as a sparse linear combination of other malware in the training set. The class with the least residual error is assigned to the unknown malware. Experiments on two standard malware datasets, Malheur dataset and Malimg dataset, show that our method outperforms current state of the art approaches and achieves a classification accuracy of 98.55\\% and 92.83\\% respectively. Further, by using a confidence measure to reject outliers, we obtain 100\\% accuracy on both datasets, at the expense of throwing away a small percentage of outliers. Finally, we evaluate our technique on two large scale malware datasets: Offensive Computing dataset (2,124 classes, 42,480 malware) and Anubis dataset (209 classes, 36,784 samples). On both datasets our method obtained an average classification accuracy of 77\\%, thus making it applicable to real world malware classification.","PeriodicalId":153680,"journal":{"name":"Proceedings of the 3rd ACM Workshop on Information Hiding and Multimedia Security","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"SATTVA: SpArsiTy inspired classificaTion of malware VAriants\",\"authors\":\"L. Nataraj, S. Karthikeyan, B. S. Manjunath\",\"doi\":\"10.1145/2756601.2756616\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There is an alarming increase in the amount of malware that is generated today. However, several studies have shown that most of these new malware are just variants of existing ones. Fast detection of these variants plays an effective role in thwarting new attacks. In this paper, we propose a novel approach to detect malware variants using a sparse representation framework. Exploiting the fact that most malware variants have small differences in their structure, we model a new/unknown malware sample as a sparse linear combination of other malware in the training set. The class with the least residual error is assigned to the unknown malware. Experiments on two standard malware datasets, Malheur dataset and Malimg dataset, show that our method outperforms current state of the art approaches and achieves a classification accuracy of 98.55\\\\% and 92.83\\\\% respectively. Further, by using a confidence measure to reject outliers, we obtain 100\\\\% accuracy on both datasets, at the expense of throwing away a small percentage of outliers. Finally, we evaluate our technique on two large scale malware datasets: Offensive Computing dataset (2,124 classes, 42,480 malware) and Anubis dataset (209 classes, 36,784 samples). On both datasets our method obtained an average classification accuracy of 77\\\\%, thus making it applicable to real world malware classification.\",\"PeriodicalId\":153680,\"journal\":{\"name\":\"Proceedings of the 3rd ACM Workshop on Information Hiding and Multimedia Security\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd ACM Workshop on Information Hiding and Multimedia Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2756601.2756616\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd ACM Workshop on Information Hiding and Multimedia Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2756601.2756616","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

摘要

如今，恶意软件的数量正以惊人的速度增长。然而，一些研究表明，这些新恶意软件中的大多数只是现有恶意软件的变体。这些变体的快速检测在挫败新的攻击中起着有效的作用。在本文中，我们提出了一种使用稀疏表示框架检测恶意软件变体的新方法。利用大多数恶意软件变体在结构上有微小差异的事实，我们将一个新的/未知的恶意软件样本建模为训练集中其他恶意软件的稀疏线性组合。将剩余误差最小的类分配给未知恶意软件。在两个标准恶意软件数据集Malheur数据集和Malimg数据集上的实验表明，该方法的分类准确率分别达到98.55%和92.83%，优于目前的方法。此外，通过使用置信度度量来拒绝异常值，我们在两个数据集上获得100%的准确性，代价是丢弃一小部分异常值。最后，我们在两个大型恶意软件数据集上评估了我们的技术:Offensive Computing数据集(2,124个类，42,480个恶意软件)和Anubis数据集(209个类，36,784个样本)。在这两个数据集上，我们的方法获得了77%的平均分类准确率，从而使其适用于现实世界的恶意软件分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SATTVA: SpArsiTy inspired classificaTion of malware VAriants

There is an alarming increase in the amount of malware that is generated today. However, several studies have shown that most of these new malware are just variants of existing ones. Fast detection of these variants plays an effective role in thwarting new attacks. In this paper, we propose a novel approach to detect malware variants using a sparse representation framework. Exploiting the fact that most malware variants have small differences in their structure, we model a new/unknown malware sample as a sparse linear combination of other malware in the training set. The class with the least residual error is assigned to the unknown malware. Experiments on two standard malware datasets, Malheur dataset and Malimg dataset, show that our method outperforms current state of the art approaches and achieves a classification accuracy of 98.55\% and 92.83\% respectively. Further, by using a confidence measure to reject outliers, we obtain 100\% accuracy on both datasets, at the expense of throwing away a small percentage of outliers. Finally, we evaluate our technique on two large scale malware datasets: Offensive Computing dataset (2,124 classes, 42,480 malware) and Anubis dataset (209 classes, 36,784 samples). On both datasets our method obtained an average classification accuracy of 77\%, thus making it applicable to real world malware classification.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 3rd ACM Workshop on Information Hiding and Multimedia Security

自引率

0.00%

发文量