使用指定的主动学习方法增强恶意文档的检测

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2015-12-01 DOI:10.1109/ICMLA.2015.52

N. Nissim, Aviad Cohen, Y. Elovici

{"title":"使用指定的主动学习方法增强恶意文档的检测","authors":"N. Nissim, Aviad Cohen, Y. Elovici","doi":"10.1109/ICMLA.2015.52","DOIUrl":null,"url":null,"abstract":"Most organizations usually create, send and receive huge amounts of documents daily, Attackers increasingly take advantage of innocent users who tend to casually open email massages assumed to be benign, carrying malicious documents. Recent targeted attacks aimed at organizations, utilize the new Microsoft Word documents (*.docx). Anti-virus software fails to detect new unknown malicious files, including malicious docx files. In this study, we present SFEM feature extraction methodology and designated Active Learning (AL) methods, aimed at accurate detection of new unknown malicious docx files that also efficiently enhances the detection's model capabilities over time. Our AL methods identify and acquire only small set of new docx files that are most likely malicious, as well as informative benign files, these files are used for enhancing the knowledge stores of both the detection model and the anti-virus software. Results show that our active learning methods used only 14% of the labeled docx files within organization which led to a reduction of 95.5% in labeling efforts compared to passive learning and SVM-Margin (existing active learning method). Our AL methods also showed a significant improvement of 91% in unknown docx malware acquisition compared to passive learning and SVM-Margin, thus providing an improved updating solution for detection model, as well as the anti-virus software widely used within organizations.","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Boosting the Detection of Malicious Documents Using Designated Active Learning Methods\",\"authors\":\"N. Nissim, Aviad Cohen, Y. Elovici\",\"doi\":\"10.1109/ICMLA.2015.52\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most organizations usually create, send and receive huge amounts of documents daily, Attackers increasingly take advantage of innocent users who tend to casually open email massages assumed to be benign, carrying malicious documents. Recent targeted attacks aimed at organizations, utilize the new Microsoft Word documents (*.docx). Anti-virus software fails to detect new unknown malicious files, including malicious docx files. In this study, we present SFEM feature extraction methodology and designated Active Learning (AL) methods, aimed at accurate detection of new unknown malicious docx files that also efficiently enhances the detection's model capabilities over time. Our AL methods identify and acquire only small set of new docx files that are most likely malicious, as well as informative benign files, these files are used for enhancing the knowledge stores of both the detection model and the anti-virus software. Results show that our active learning methods used only 14% of the labeled docx files within organization which led to a reduction of 95.5% in labeling efforts compared to passive learning and SVM-Margin (existing active learning method). Our AL methods also showed a significant improvement of 91% in unknown docx malware acquisition compared to passive learning and SVM-Margin, thus providing an improved updating solution for detection model, as well as the anti-virus software widely used within organizations.\",\"PeriodicalId\":288427,\"journal\":{\"name\":\"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"110 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2015.52\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2015.52","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

大多数组织通常每天都会创建、发送和接收大量文档，攻击者越来越多地利用无辜用户，这些用户倾向于随意打开被认为是良性的电子邮件，携带恶意文档。最近针对组织的针对性攻击，利用新的Microsoft Word文档(*.docx)。杀毒软件无法检测到新的未知恶意文件，包括恶意docx文件。在本研究中，我们提出了SFEM特征提取方法和指定的主动学习(AL)方法，旨在准确检测新的未知恶意docx文件，并随着时间的推移有效地增强检测的模型能力。我们的人工智能方法只识别和获取一小部分最有可能是恶意的新docx文件，以及信息丰富的良性文件，这些文件用于增强检测模型和防病毒软件的知识存储。结果表明，我们的主动学习方法只使用了组织内14%的标记docx文件，与被动学习和SVM-Margin(现有的主动学习方法)相比，减少了95.5%的标记工作量。与被动学习和SVM-Margin相比，我们的人工智能方法在未知docx恶意软件获取方面也有91%的显著提高，从而为检测模型以及组织内广泛使用的杀毒软件提供了改进的更新解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Boosting the Detection of Malicious Documents Using Designated Active Learning Methods

Most organizations usually create, send and receive huge amounts of documents daily, Attackers increasingly take advantage of innocent users who tend to casually open email massages assumed to be benign, carrying malicious documents. Recent targeted attacks aimed at organizations, utilize the new Microsoft Word documents (*.docx). Anti-virus software fails to detect new unknown malicious files, including malicious docx files. In this study, we present SFEM feature extraction methodology and designated Active Learning (AL) methods, aimed at accurate detection of new unknown malicious docx files that also efficiently enhances the detection's model capabilities over time. Our AL methods identify and acquire only small set of new docx files that are most likely malicious, as well as informative benign files, these files are used for enhancing the knowledge stores of both the detection model and the anti-virus software. Results show that our active learning methods used only 14% of the labeled docx files within organization which led to a reduction of 95.5% in labeling efforts compared to passive learning and SVM-Margin (existing active learning method). Our AL methods also showed a significant improvement of 91% in unknown docx malware acquisition compared to passive learning and SVM-Margin, thus providing an improved updating solution for detection model, as well as the anti-virus software widely used within organizations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)

自引率

0.00%

发文量