A fast malware detection model based on heterogeneous graph similarity search

IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Tun Li, Peng Shou, Xin Wan, Qian Li, Rong Wang, Chaolong Jia, Yunpeng Xiao
{"title":"A fast malware detection model based on heterogeneous graph similarity search","authors":"Tun Li,&nbsp;Peng Shou,&nbsp;Xin Wan,&nbsp;Qian Li,&nbsp;Rong Wang,&nbsp;Chaolong Jia,&nbsp;Yunpeng Xiao","doi":"10.1016/j.comnet.2024.110799","DOIUrl":null,"url":null,"abstract":"<div><p>The Android operating system has long been vulnerable to malicious software. Existing malware detection methods often fail to identify ever-evolving malware and are slow in detection. To address this, we propose a new model for rapid Android malware detection, which constructs various Android entities and relationships into a heterogeneous graph. Firstly, to address the semantic fusion problem in high-order heterogeneous graphs that arises with the increase in the depth of the heterogeneous graph model, we introduce adaptive weights during node aggregation to absorb the local semantics of nodes. This allows more attention to be paid to the feature information of the node itself during the semantic aggregation stage, thereby avoiding semantic confusion. Secondly, to mitigate the high time costs associated with detecting unknown applications, we employ an incremental similarity search model. This model quickly measures the similarity between unknown applications and those within the sample, aggregating the weights of nodes based on similarity scores and semantic attention coefficients, thereby enabling rapid detection. Lastly, considering the high time and space complexity of calculating node similarity scores on large graphs, we design a <em>NeuSim</em> model based on an encoder–decoder structure. The encoder module embeds each path instance as a vector, while the decoder converts the vector into a scalar similarity score, significantly reducing the complexity of the calculation. Experiments demonstrate that this model can not only rapidly detect malware but also capture high-level semantic relationships of application software in complex malware networks by hierarchically aggregating information from neighbors and meta-paths of different orders. Moreover, this model achieved an AUC of 0.9356 and an F1 score of 0.9355, surpassing existing malware detection algorithms. Particularly in the detection of unknown application software, the <em>NeuSim</em> model can double the detection speed, with an average detection time of 105 ms.</p></div>","PeriodicalId":50637,"journal":{"name":"Computer Networks","volume":null,"pages":null},"PeriodicalIF":4.4000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389128624006315","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

The Android operating system has long been vulnerable to malicious software. Existing malware detection methods often fail to identify ever-evolving malware and are slow in detection. To address this, we propose a new model for rapid Android malware detection, which constructs various Android entities and relationships into a heterogeneous graph. Firstly, to address the semantic fusion problem in high-order heterogeneous graphs that arises with the increase in the depth of the heterogeneous graph model, we introduce adaptive weights during node aggregation to absorb the local semantics of nodes. This allows more attention to be paid to the feature information of the node itself during the semantic aggregation stage, thereby avoiding semantic confusion. Secondly, to mitigate the high time costs associated with detecting unknown applications, we employ an incremental similarity search model. This model quickly measures the similarity between unknown applications and those within the sample, aggregating the weights of nodes based on similarity scores and semantic attention coefficients, thereby enabling rapid detection. Lastly, considering the high time and space complexity of calculating node similarity scores on large graphs, we design a NeuSim model based on an encoder–decoder structure. The encoder module embeds each path instance as a vector, while the decoder converts the vector into a scalar similarity score, significantly reducing the complexity of the calculation. Experiments demonstrate that this model can not only rapidly detect malware but also capture high-level semantic relationships of application software in complex malware networks by hierarchically aggregating information from neighbors and meta-paths of different orders. Moreover, this model achieved an AUC of 0.9356 and an F1 score of 0.9355, surpassing existing malware detection algorithms. Particularly in the detection of unknown application software, the NeuSim model can double the detection speed, with an average detection time of 105 ms.

基于异构图相似性搜索的快速恶意软件检测模型
长期以来,安卓操作系统一直容易受到恶意软件的攻击。现有的恶意软件检测方法往往无法识别不断演变的恶意软件,而且检测速度缓慢。针对这一问题,我们提出了一种快速检测安卓恶意软件的新模型,将各种安卓实体和关系构建成一个异构图。首先,为了解决高阶异构图中随着异构图模型深度增加而产生的语义融合问题,我们在节点聚合过程中引入了自适应权重,以吸收节点的局部语义。这样就能在语义聚合阶段更多地关注节点本身的特征信息,从而避免语义混淆。其次,为了降低检测未知应用所需的高昂时间成本,我们采用了增量相似性搜索模型。该模型可快速测量未知应用与样本内应用之间的相似性,根据相似性得分和语义关注系数聚合节点的权重,从而实现快速检测。最后,考虑到在大型图上计算节点相似性得分的时间和空间复杂性较高,我们设计了一个基于编码器-解码器结构的 NeuSim 模型。编码器模块将每个路径实例嵌入为一个向量,而解码器则将向量转换为标量相似性得分,从而大大降低了计算的复杂性。实验证明,该模型不仅能快速检测恶意软件,还能通过分层聚合来自邻域和不同阶元路径的信息,捕捉复杂恶意软件网络中应用软件的高层语义关系。此外,该模型的AUC达到0.9356,F1得分达到0.9355,超越了现有的恶意软件检测算法。特别是在检测未知应用软件时,NeuSim 模型能将检测速度提高一倍,平均检测时间为 105 毫秒。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer Networks
Computer Networks 工程技术-电信学
CiteScore
10.80
自引率
3.60%
发文量
434
审稿时长
8.6 months
期刊介绍: Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors. The Editorial Board will consider any material for publication that is of interest to those groups.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信