SATTVA:基于SpArsiTy的恶意软件变体分类

L. Nataraj, S. Karthikeyan, B. S. Manjunath
{"title":"SATTVA:基于SpArsiTy的恶意软件变体分类","authors":"L. Nataraj, S. Karthikeyan, B. S. Manjunath","doi":"10.1145/2756601.2756616","DOIUrl":null,"url":null,"abstract":"There is an alarming increase in the amount of malware that is generated today. However, several studies have shown that most of these new malware are just variants of existing ones. Fast detection of these variants plays an effective role in thwarting new attacks. In this paper, we propose a novel approach to detect malware variants using a sparse representation framework. Exploiting the fact that most malware variants have small differences in their structure, we model a new/unknown malware sample as a sparse linear combination of other malware in the training set. The class with the least residual error is assigned to the unknown malware. Experiments on two standard malware datasets, Malheur dataset and Malimg dataset, show that our method outperforms current state of the art approaches and achieves a classification accuracy of 98.55\\% and 92.83\\% respectively. Further, by using a confidence measure to reject outliers, we obtain 100\\% accuracy on both datasets, at the expense of throwing away a small percentage of outliers. Finally, we evaluate our technique on two large scale malware datasets: Offensive Computing dataset (2,124 classes, 42,480 malware) and Anubis dataset (209 classes, 36,784 samples). On both datasets our method obtained an average classification accuracy of 77\\%, thus making it applicable to real world malware classification.","PeriodicalId":153680,"journal":{"name":"Proceedings of the 3rd ACM Workshop on Information Hiding and Multimedia Security","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"SATTVA: SpArsiTy inspired classificaTion of malware VAriants\",\"authors\":\"L. Nataraj, S. Karthikeyan, B. S. Manjunath\",\"doi\":\"10.1145/2756601.2756616\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There is an alarming increase in the amount of malware that is generated today. However, several studies have shown that most of these new malware are just variants of existing ones. Fast detection of these variants plays an effective role in thwarting new attacks. In this paper, we propose a novel approach to detect malware variants using a sparse representation framework. Exploiting the fact that most malware variants have small differences in their structure, we model a new/unknown malware sample as a sparse linear combination of other malware in the training set. The class with the least residual error is assigned to the unknown malware. Experiments on two standard malware datasets, Malheur dataset and Malimg dataset, show that our method outperforms current state of the art approaches and achieves a classification accuracy of 98.55\\\\% and 92.83\\\\% respectively. Further, by using a confidence measure to reject outliers, we obtain 100\\\\% accuracy on both datasets, at the expense of throwing away a small percentage of outliers. Finally, we evaluate our technique on two large scale malware datasets: Offensive Computing dataset (2,124 classes, 42,480 malware) and Anubis dataset (209 classes, 36,784 samples). On both datasets our method obtained an average classification accuracy of 77\\\\%, thus making it applicable to real world malware classification.\",\"PeriodicalId\":153680,\"journal\":{\"name\":\"Proceedings of the 3rd ACM Workshop on Information Hiding and Multimedia Security\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd ACM Workshop on Information Hiding and Multimedia Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2756601.2756616\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd ACM Workshop on Information Hiding and Multimedia Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2756601.2756616","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

摘要

如今,恶意软件的数量正以惊人的速度增长。然而,一些研究表明,这些新恶意软件中的大多数只是现有恶意软件的变体。这些变体的快速检测在挫败新的攻击中起着有效的作用。在本文中,我们提出了一种使用稀疏表示框架检测恶意软件变体的新方法。利用大多数恶意软件变体在结构上有微小差异的事实,我们将一个新的/未知的恶意软件样本建模为训练集中其他恶意软件的稀疏线性组合。将剩余误差最小的类分配给未知恶意软件。在两个标准恶意软件数据集Malheur数据集和Malimg数据集上的实验表明,该方法的分类准确率分别达到98.55%和92.83%,优于目前的方法。此外,通过使用置信度度量来拒绝异常值,我们在两个数据集上获得100%的准确性,代价是丢弃一小部分异常值。最后,我们在两个大型恶意软件数据集上评估了我们的技术:Offensive Computing数据集(2,124个类,42,480个恶意软件)和Anubis数据集(209个类,36,784个样本)。在这两个数据集上,我们的方法获得了77%的平均分类准确率,从而使其适用于现实世界的恶意软件分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SATTVA: SpArsiTy inspired classificaTion of malware VAriants
There is an alarming increase in the amount of malware that is generated today. However, several studies have shown that most of these new malware are just variants of existing ones. Fast detection of these variants plays an effective role in thwarting new attacks. In this paper, we propose a novel approach to detect malware variants using a sparse representation framework. Exploiting the fact that most malware variants have small differences in their structure, we model a new/unknown malware sample as a sparse linear combination of other malware in the training set. The class with the least residual error is assigned to the unknown malware. Experiments on two standard malware datasets, Malheur dataset and Malimg dataset, show that our method outperforms current state of the art approaches and achieves a classification accuracy of 98.55\% and 92.83\% respectively. Further, by using a confidence measure to reject outliers, we obtain 100\% accuracy on both datasets, at the expense of throwing away a small percentage of outliers. Finally, we evaluate our technique on two large scale malware datasets: Offensive Computing dataset (2,124 classes, 42,480 malware) and Anubis dataset (209 classes, 36,784 samples). On both datasets our method obtained an average classification accuracy of 77\%, thus making it applicable to real world malware classification.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信