Boosting the Detection of Malicious Documents Using Designated Active Learning Methods

N. Nissim, Aviad Cohen, Y. Elovici
{"title":"Boosting the Detection of Malicious Documents Using Designated Active Learning Methods","authors":"N. Nissim, Aviad Cohen, Y. Elovici","doi":"10.1109/ICMLA.2015.52","DOIUrl":null,"url":null,"abstract":"Most organizations usually create, send and receive huge amounts of documents daily, Attackers increasingly take advantage of innocent users who tend to casually open email massages assumed to be benign, carrying malicious documents. Recent targeted attacks aimed at organizations, utilize the new Microsoft Word documents (*.docx). Anti-virus software fails to detect new unknown malicious files, including malicious docx files. In this study, we present SFEM feature extraction methodology and designated Active Learning (AL) methods, aimed at accurate detection of new unknown malicious docx files that also efficiently enhances the detection's model capabilities over time. Our AL methods identify and acquire only small set of new docx files that are most likely malicious, as well as informative benign files, these files are used for enhancing the knowledge stores of both the detection model and the anti-virus software. Results show that our active learning methods used only 14% of the labeled docx files within organization which led to a reduction of 95.5% in labeling efforts compared to passive learning and SVM-Margin (existing active learning method). Our AL methods also showed a significant improvement of 91% in unknown docx malware acquisition compared to passive learning and SVM-Margin, thus providing an improved updating solution for detection model, as well as the anti-virus software widely used within organizations.","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2015.52","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Most organizations usually create, send and receive huge amounts of documents daily, Attackers increasingly take advantage of innocent users who tend to casually open email massages assumed to be benign, carrying malicious documents. Recent targeted attacks aimed at organizations, utilize the new Microsoft Word documents (*.docx). Anti-virus software fails to detect new unknown malicious files, including malicious docx files. In this study, we present SFEM feature extraction methodology and designated Active Learning (AL) methods, aimed at accurate detection of new unknown malicious docx files that also efficiently enhances the detection's model capabilities over time. Our AL methods identify and acquire only small set of new docx files that are most likely malicious, as well as informative benign files, these files are used for enhancing the knowledge stores of both the detection model and the anti-virus software. Results show that our active learning methods used only 14% of the labeled docx files within organization which led to a reduction of 95.5% in labeling efforts compared to passive learning and SVM-Margin (existing active learning method). Our AL methods also showed a significant improvement of 91% in unknown docx malware acquisition compared to passive learning and SVM-Margin, thus providing an improved updating solution for detection model, as well as the anti-virus software widely used within organizations.
使用指定的主动学习方法增强恶意文档的检测
大多数组织通常每天都会创建、发送和接收大量文档,攻击者越来越多地利用无辜用户,这些用户倾向于随意打开被认为是良性的电子邮件,携带恶意文档。最近针对组织的针对性攻击,利用新的Microsoft Word文档(*.docx)。杀毒软件无法检测到新的未知恶意文件,包括恶意docx文件。在本研究中,我们提出了SFEM特征提取方法和指定的主动学习(AL)方法,旨在准确检测新的未知恶意docx文件,并随着时间的推移有效地增强检测的模型能力。我们的人工智能方法只识别和获取一小部分最有可能是恶意的新docx文件,以及信息丰富的良性文件,这些文件用于增强检测模型和防病毒软件的知识存储。结果表明,我们的主动学习方法只使用了组织内14%的标记docx文件,与被动学习和SVM-Margin(现有的主动学习方法)相比,减少了95.5%的标记工作量。与被动学习和SVM-Margin相比,我们的人工智能方法在未知docx恶意软件获取方面也有91%的显著提高,从而为检测模型以及组织内广泛使用的杀毒软件提供了改进的更新解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信