Malware Detection through Contextualized Vector Embeddings

Vinay Pandya, Fabio Di Troia
{"title":"Malware Detection through Contextualized Vector Embeddings","authors":"Vinay Pandya, Fabio Di Troia","doi":"10.1109/SVCC56964.2023.10165170","DOIUrl":null,"url":null,"abstract":"Detecting malware is an integral part of system security. In recent years, machine learning models have been applied with success to overcome this challenging problem. The aim of this research is to apply context-dependent word embeddings to classify malware. We extract opcodes from the malware samples and use them to generate the embeddings that train the classifiers. Transformers are a novel architecture that utilizes self-attention to handle long-range dependencies. Different transformer architectures, namely, BERT, DistilBERT, AIBERT, and RoBERTa, are implemented in this work to generate context-dependent word embeddings. Apart from using transformer models, we also experimented with ELMo, a bidirectional language model which can generate contextualized opcode embeddings. These embeddings are used to train our machine learning models in classifying samples from different malware families. We compared our contextualized results with context-free embeddings generated by Word2Vec, and HMM2Vec algorithms. The classification algorithms trained on our embeddings consist of Resnet-18 CNN, Random Forest, Support Vector Machines (SVMs), and k-Nearest Neighbours (k-NNs).","PeriodicalId":243155,"journal":{"name":"2023 Silicon Valley Cybersecurity Conference (SVCC)","volume":"312 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Silicon Valley Cybersecurity Conference (SVCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SVCC56964.2023.10165170","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Detecting malware is an integral part of system security. In recent years, machine learning models have been applied with success to overcome this challenging problem. The aim of this research is to apply context-dependent word embeddings to classify malware. We extract opcodes from the malware samples and use them to generate the embeddings that train the classifiers. Transformers are a novel architecture that utilizes self-attention to handle long-range dependencies. Different transformer architectures, namely, BERT, DistilBERT, AIBERT, and RoBERTa, are implemented in this work to generate context-dependent word embeddings. Apart from using transformer models, we also experimented with ELMo, a bidirectional language model which can generate contextualized opcode embeddings. These embeddings are used to train our machine learning models in classifying samples from different malware families. We compared our contextualized results with context-free embeddings generated by Word2Vec, and HMM2Vec algorithms. The classification algorithms trained on our embeddings consist of Resnet-18 CNN, Random Forest, Support Vector Machines (SVMs), and k-Nearest Neighbours (k-NNs).
基于情境化向量嵌入的恶意软件检测
检测恶意软件是系统安全的一个组成部分。近年来,机器学习模型已经成功地应用于克服这一具有挑战性的问题。本研究的目的是应用上下文相关的词嵌入对恶意软件进行分类。我们从恶意软件样本中提取操作码,并使用它们来生成训练分类器的嵌入。变形金刚是一种新颖的体系结构,它利用自关注来处理远程依赖关系。不同的转换器架构,即BERT、DistilBERT、AIBERT和RoBERTa,在这项工作中被实现来生成上下文相关的词嵌入。除了使用转换模型外,我们还尝试了ELMo,这是一种可以生成上下文化操作码嵌入的双向语言模型。这些嵌入用于训练我们的机器学习模型对来自不同恶意软件家族的样本进行分类。我们将上下文化的结果与Word2Vec和HMM2Vec算法生成的无上下文嵌入进行了比较。在我们的嵌入上训练的分类算法包括Resnet-18 CNN、随机森林、支持向量机(svm)和k-近邻(k- nn)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信