Detection of Malware using Machine Learning based on Operation Code Frequency

Pavitra Mohandas, Sudesh Kumar Santhosh Kumar, Sandeep Pai Kulyadi, M. J. Shankar Raman, Vasan V S, B. Venkataswami
{"title":"Detection of Malware using Machine Learning based on Operation Code Frequency","authors":"Pavitra Mohandas, Sudesh Kumar Santhosh Kumar, Sandeep Pai Kulyadi, M. J. Shankar Raman, Vasan V S, B. Venkataswami","doi":"10.1109/IAICT52856.2021.9532521","DOIUrl":null,"url":null,"abstract":"One of the many methods for identifying malware is to disassemble the malware files and obtain the opcodes from them. Since malware have predominantly been found to contain specific opcode sequences in them, the presence of the same sequences in any incoming file or network content can be taken up as a possible malware identification scheme. Malware detection systems help us to understand more about ways on how malware attack a system and how it can be prevented. The proposed method analyses malware executable files with the help of opcode information by converting the incoming executable files to assembly language thereby extracting opcode information (opcode count) from the same. The opcode count is then converted into opcode frequency which is stored in a CSV file format. The CSV file is passed to various machine learning algorithms like Decision Tree Classifier, Random Forest Classifier and Naive Bayes Classifier. Random Forest Classifier produced the highest accuracy and hence the same model was used to predict whether an incoming file contains a potential malware or not.","PeriodicalId":416542,"journal":{"name":"2021 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IAICT52856.2021.9532521","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

One of the many methods for identifying malware is to disassemble the malware files and obtain the opcodes from them. Since malware have predominantly been found to contain specific opcode sequences in them, the presence of the same sequences in any incoming file or network content can be taken up as a possible malware identification scheme. Malware detection systems help us to understand more about ways on how malware attack a system and how it can be prevented. The proposed method analyses malware executable files with the help of opcode information by converting the incoming executable files to assembly language thereby extracting opcode information (opcode count) from the same. The opcode count is then converted into opcode frequency which is stored in a CSV file format. The CSV file is passed to various machine learning algorithms like Decision Tree Classifier, Random Forest Classifier and Naive Bayes Classifier. Random Forest Classifier produced the highest accuracy and hence the same model was used to predict whether an incoming file contains a potential malware or not.
基于操作码频率的机器学习恶意软件检测
识别恶意软件的众多方法之一是反汇编恶意软件文件并从中获取操作码。由于恶意软件主要被发现包含特定的操作码序列,因此在任何传入文件或网络内容中存在相同的序列都可以被视为可能的恶意软件识别方案。恶意软件检测系统帮助我们更多地了解恶意软件如何攻击系统以及如何阻止它。该方法通过将传入的可执行文件转换为汇编语言,从而从中提取操作码信息(操作码计数),从而利用操作码信息分析恶意软件可执行文件。然后将操作码计数转换为操作码频率,操作码频率以CSV文件格式存储。CSV文件被传递给各种机器学习算法,如决策树分类器,随机森林分类器和朴素贝叶斯分类器。随机森林分类器产生了最高的准确性,因此使用相同的模型来预测传入的文件是否包含潜在的恶意软件。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信