基于杜鹃沙盒生成报告的Windows恶意软件检测采用机器学习算法

2016 11th International Conference on Industrial and Information Systems (ICIIS) Pub Date : 2016-12-01 DOI:10.1109/ICIINFS.2016.8262998

S. Darshan, A. AjayKumaraM., C. Jaidhar

{"title":"基于杜鹃沙盒生成报告的Windows恶意软件检测采用机器学习算法","authors":"S. Darshan, A. AjayKumaraM., C. Jaidhar","doi":"10.1109/ICIINFS.2016.8262998","DOIUrl":null,"url":null,"abstract":"Malicious software or malware has grown rapidly and many anti-malware defensive solutions have failed to detect the unknown malware since most of them rely on signature-based technique. This technique can detect a malware based on a pre-defined signature, which achieves poor performance when attempting to classify unseen malware with the capability to evade detection using various code obfuscation techniques. This growing evasion capability of new and unknown malwares needs to be countered by analyzing the malware dynamically in a sandbox environment, since the sandbox provides an isolated environment for analyzing the behavior of the malware. In this paper, the malware is executed on to the cuckoo sandbox to obtain its run-time behavior. At the end of the execution, the cuckoo sandbox reports the system calls invoked by the malware during execution. However, this report is in JSON format and has to be converted to MIST format to extract the system calls. The collected system calls are structured in the form of N-Grams, which help to build the classifier by using the Information Gain (IG) as a feature selection technique. A comprehensive experiment was conducted to perceive the best fit classifier among the chosen classifiers, including the Bayesian-Logistic-Regression, SPegasos, IB1, Bagging, Part, and J48 defined within the WEKA tool. From the experimental results, the overall best performance for all the selected top N-Grams such as 200, 400, and 600 goes to SPegasos with the highest accuracy, highest True Positive Rate (TPR), and lowest False Positive Rate (FPR).","PeriodicalId":234609,"journal":{"name":"2016 11th International Conference on Industrial and Information Systems (ICIIS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Windows malware detection based on cuckoo sandbox generated report using machine learning algorithm\",\"authors\":\"S. Darshan, A. AjayKumaraM., C. Jaidhar\",\"doi\":\"10.1109/ICIINFS.2016.8262998\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Malicious software or malware has grown rapidly and many anti-malware defensive solutions have failed to detect the unknown malware since most of them rely on signature-based technique. This technique can detect a malware based on a pre-defined signature, which achieves poor performance when attempting to classify unseen malware with the capability to evade detection using various code obfuscation techniques. This growing evasion capability of new and unknown malwares needs to be countered by analyzing the malware dynamically in a sandbox environment, since the sandbox provides an isolated environment for analyzing the behavior of the malware. In this paper, the malware is executed on to the cuckoo sandbox to obtain its run-time behavior. At the end of the execution, the cuckoo sandbox reports the system calls invoked by the malware during execution. However, this report is in JSON format and has to be converted to MIST format to extract the system calls. The collected system calls are structured in the form of N-Grams, which help to build the classifier by using the Information Gain (IG) as a feature selection technique. A comprehensive experiment was conducted to perceive the best fit classifier among the chosen classifiers, including the Bayesian-Logistic-Regression, SPegasos, IB1, Bagging, Part, and J48 defined within the WEKA tool. From the experimental results, the overall best performance for all the selected top N-Grams such as 200, 400, and 600 goes to SPegasos with the highest accuracy, highest True Positive Rate (TPR), and lowest False Positive Rate (FPR).\",\"PeriodicalId\":234609,\"journal\":{\"name\":\"2016 11th International Conference on Industrial and Information Systems (ICIIS)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 11th International Conference on Industrial and Information Systems (ICIIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIINFS.2016.8262998\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 11th International Conference on Industrial and Information Systems (ICIIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIINFS.2016.8262998","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

摘要

随着恶意软件的迅速发展，许多反恶意软件的防御方案由于依赖于基于签名的技术而无法检测到未知的恶意软件。该技术可以基于预定义的签名检测恶意软件，当试图对不可见的恶意软件进行分类时，这种方法的性能很差，并且可以使用各种代码混淆技术逃避检测。由于沙箱为分析恶意软件的行为提供了一个孤立的环境，因此需要通过在沙箱环境中动态分析恶意软件来应对新的和未知的恶意软件日益增长的逃避能力。在本文中，恶意软件在布谷鸟沙盒上执行以获得其运行时行为。在执行结束时，杜鹃沙盒报告恶意软件在执行期间调用的系统调用。但是，该报告是JSON格式的，必须转换为MIST格式才能提取系统调用。收集到的系统调用以n - gram的形式进行结构化，这有助于通过使用信息增益(Information Gain, IG)作为特征选择技术来构建分类器。在WEKA工具中定义的贝叶斯-逻辑回归、SPegasos、IB1、Bagging、Part和J48等分类器中，进行了一个全面的实验来感知最适合的分类器。从实验结果来看，在所有选择的前n - gram(如200,400和600)中，SPegasos具有最高的准确率，最高的真阳性率(TPR)和最低的假阳性率(FPR)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Windows malware detection based on cuckoo sandbox generated report using machine learning algorithm

Malicious software or malware has grown rapidly and many anti-malware defensive solutions have failed to detect the unknown malware since most of them rely on signature-based technique. This technique can detect a malware based on a pre-defined signature, which achieves poor performance when attempting to classify unseen malware with the capability to evade detection using various code obfuscation techniques. This growing evasion capability of new and unknown malwares needs to be countered by analyzing the malware dynamically in a sandbox environment, since the sandbox provides an isolated environment for analyzing the behavior of the malware. In this paper, the malware is executed on to the cuckoo sandbox to obtain its run-time behavior. At the end of the execution, the cuckoo sandbox reports the system calls invoked by the malware during execution. However, this report is in JSON format and has to be converted to MIST format to extract the system calls. The collected system calls are structured in the form of N-Grams, which help to build the classifier by using the Information Gain (IG) as a feature selection technique. A comprehensive experiment was conducted to perceive the best fit classifier among the chosen classifiers, including the Bayesian-Logistic-Regression, SPegasos, IB1, Bagging, Part, and J48 defined within the WEKA tool. From the experimental results, the overall best performance for all the selected top N-Grams such as 200, 400, and 600 goes to SPegasos with the highest accuracy, highest True Positive Rate (TPR), and lowest False Positive Rate (FPR).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 11th International Conference on Industrial and Information Systems (ICIIS)

自引率

0.00%

发文量