基于杜鹃沙盒生成报告的Windows恶意软件检测采用机器学习算法

S. Darshan, A. AjayKumaraM., C. Jaidhar
{"title":"基于杜鹃沙盒生成报告的Windows恶意软件检测采用机器学习算法","authors":"S. Darshan, A. AjayKumaraM., C. Jaidhar","doi":"10.1109/ICIINFS.2016.8262998","DOIUrl":null,"url":null,"abstract":"Malicious software or malware has grown rapidly and many anti-malware defensive solutions have failed to detect the unknown malware since most of them rely on signature-based technique. This technique can detect a malware based on a pre-defined signature, which achieves poor performance when attempting to classify unseen malware with the capability to evade detection using various code obfuscation techniques. This growing evasion capability of new and unknown malwares needs to be countered by analyzing the malware dynamically in a sandbox environment, since the sandbox provides an isolated environment for analyzing the behavior of the malware. In this paper, the malware is executed on to the cuckoo sandbox to obtain its run-time behavior. At the end of the execution, the cuckoo sandbox reports the system calls invoked by the malware during execution. However, this report is in JSON format and has to be converted to MIST format to extract the system calls. The collected system calls are structured in the form of N-Grams, which help to build the classifier by using the Information Gain (IG) as a feature selection technique. A comprehensive experiment was conducted to perceive the best fit classifier among the chosen classifiers, including the Bayesian-Logistic-Regression, SPegasos, IB1, Bagging, Part, and J48 defined within the WEKA tool. From the experimental results, the overall best performance for all the selected top N-Grams such as 200, 400, and 600 goes to SPegasos with the highest accuracy, highest True Positive Rate (TPR), and lowest False Positive Rate (FPR).","PeriodicalId":234609,"journal":{"name":"2016 11th International Conference on Industrial and Information Systems (ICIIS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Windows malware detection based on cuckoo sandbox generated report using machine learning algorithm\",\"authors\":\"S. Darshan, A. AjayKumaraM., C. Jaidhar\",\"doi\":\"10.1109/ICIINFS.2016.8262998\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Malicious software or malware has grown rapidly and many anti-malware defensive solutions have failed to detect the unknown malware since most of them rely on signature-based technique. This technique can detect a malware based on a pre-defined signature, which achieves poor performance when attempting to classify unseen malware with the capability to evade detection using various code obfuscation techniques. This growing evasion capability of new and unknown malwares needs to be countered by analyzing the malware dynamically in a sandbox environment, since the sandbox provides an isolated environment for analyzing the behavior of the malware. In this paper, the malware is executed on to the cuckoo sandbox to obtain its run-time behavior. At the end of the execution, the cuckoo sandbox reports the system calls invoked by the malware during execution. However, this report is in JSON format and has to be converted to MIST format to extract the system calls. The collected system calls are structured in the form of N-Grams, which help to build the classifier by using the Information Gain (IG) as a feature selection technique. A comprehensive experiment was conducted to perceive the best fit classifier among the chosen classifiers, including the Bayesian-Logistic-Regression, SPegasos, IB1, Bagging, Part, and J48 defined within the WEKA tool. From the experimental results, the overall best performance for all the selected top N-Grams such as 200, 400, and 600 goes to SPegasos with the highest accuracy, highest True Positive Rate (TPR), and lowest False Positive Rate (FPR).\",\"PeriodicalId\":234609,\"journal\":{\"name\":\"2016 11th International Conference on Industrial and Information Systems (ICIIS)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 11th International Conference on Industrial and Information Systems (ICIIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIINFS.2016.8262998\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 11th International Conference on Industrial and Information Systems (ICIIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIINFS.2016.8262998","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

摘要

随着恶意软件的迅速发展,许多反恶意软件的防御方案由于依赖于基于签名的技术而无法检测到未知的恶意软件。该技术可以基于预定义的签名检测恶意软件,当试图对不可见的恶意软件进行分类时,这种方法的性能很差,并且可以使用各种代码混淆技术逃避检测。由于沙箱为分析恶意软件的行为提供了一个孤立的环境,因此需要通过在沙箱环境中动态分析恶意软件来应对新的和未知的恶意软件日益增长的逃避能力。在本文中,恶意软件在布谷鸟沙盒上执行以获得其运行时行为。在执行结束时,杜鹃沙盒报告恶意软件在执行期间调用的系统调用。但是,该报告是JSON格式的,必须转换为MIST格式才能提取系统调用。收集到的系统调用以n - gram的形式进行结构化,这有助于通过使用信息增益(Information Gain, IG)作为特征选择技术来构建分类器。在WEKA工具中定义的贝叶斯-逻辑回归、SPegasos、IB1、Bagging、Part和J48等分类器中,进行了一个全面的实验来感知最适合的分类器。从实验结果来看,在所有选择的前n - gram(如200,400和600)中,SPegasos具有最高的准确率,最高的真阳性率(TPR)和最低的假阳性率(FPR)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Windows malware detection based on cuckoo sandbox generated report using machine learning algorithm
Malicious software or malware has grown rapidly and many anti-malware defensive solutions have failed to detect the unknown malware since most of them rely on signature-based technique. This technique can detect a malware based on a pre-defined signature, which achieves poor performance when attempting to classify unseen malware with the capability to evade detection using various code obfuscation techniques. This growing evasion capability of new and unknown malwares needs to be countered by analyzing the malware dynamically in a sandbox environment, since the sandbox provides an isolated environment for analyzing the behavior of the malware. In this paper, the malware is executed on to the cuckoo sandbox to obtain its run-time behavior. At the end of the execution, the cuckoo sandbox reports the system calls invoked by the malware during execution. However, this report is in JSON format and has to be converted to MIST format to extract the system calls. The collected system calls are structured in the form of N-Grams, which help to build the classifier by using the Information Gain (IG) as a feature selection technique. A comprehensive experiment was conducted to perceive the best fit classifier among the chosen classifiers, including the Bayesian-Logistic-Regression, SPegasos, IB1, Bagging, Part, and J48 defined within the WEKA tool. From the experimental results, the overall best performance for all the selected top N-Grams such as 200, 400, and 600 goes to SPegasos with the highest accuracy, highest True Positive Rate (TPR), and lowest False Positive Rate (FPR).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信