Tun Li, Peng Shou, Xin Wan, Qian Li, Rong Wang, Chaolong Jia, Yunpeng Xiao
{"title":"基于异构图相似性搜索的快速恶意软件检测模型","authors":"Tun Li, Peng Shou, Xin Wan, Qian Li, Rong Wang, Chaolong Jia, Yunpeng Xiao","doi":"10.1016/j.comnet.2024.110799","DOIUrl":null,"url":null,"abstract":"<div><p>The Android operating system has long been vulnerable to malicious software. Existing malware detection methods often fail to identify ever-evolving malware and are slow in detection. To address this, we propose a new model for rapid Android malware detection, which constructs various Android entities and relationships into a heterogeneous graph. Firstly, to address the semantic fusion problem in high-order heterogeneous graphs that arises with the increase in the depth of the heterogeneous graph model, we introduce adaptive weights during node aggregation to absorb the local semantics of nodes. This allows more attention to be paid to the feature information of the node itself during the semantic aggregation stage, thereby avoiding semantic confusion. Secondly, to mitigate the high time costs associated with detecting unknown applications, we employ an incremental similarity search model. This model quickly measures the similarity between unknown applications and those within the sample, aggregating the weights of nodes based on similarity scores and semantic attention coefficients, thereby enabling rapid detection. Lastly, considering the high time and space complexity of calculating node similarity scores on large graphs, we design a <em>NeuSim</em> model based on an encoder–decoder structure. The encoder module embeds each path instance as a vector, while the decoder converts the vector into a scalar similarity score, significantly reducing the complexity of the calculation. Experiments demonstrate that this model can not only rapidly detect malware but also capture high-level semantic relationships of application software in complex malware networks by hierarchically aggregating information from neighbors and meta-paths of different orders. Moreover, this model achieved an AUC of 0.9356 and an F1 score of 0.9355, surpassing existing malware detection algorithms. Particularly in the detection of unknown application software, the <em>NeuSim</em> model can double the detection speed, with an average detection time of 105 ms.</p></div>","PeriodicalId":50637,"journal":{"name":"Computer Networks","volume":null,"pages":null},"PeriodicalIF":4.4000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A fast malware detection model based on heterogeneous graph similarity search\",\"authors\":\"Tun Li, Peng Shou, Xin Wan, Qian Li, Rong Wang, Chaolong Jia, Yunpeng Xiao\",\"doi\":\"10.1016/j.comnet.2024.110799\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The Android operating system has long been vulnerable to malicious software. Existing malware detection methods often fail to identify ever-evolving malware and are slow in detection. To address this, we propose a new model for rapid Android malware detection, which constructs various Android entities and relationships into a heterogeneous graph. Firstly, to address the semantic fusion problem in high-order heterogeneous graphs that arises with the increase in the depth of the heterogeneous graph model, we introduce adaptive weights during node aggregation to absorb the local semantics of nodes. This allows more attention to be paid to the feature information of the node itself during the semantic aggregation stage, thereby avoiding semantic confusion. Secondly, to mitigate the high time costs associated with detecting unknown applications, we employ an incremental similarity search model. This model quickly measures the similarity between unknown applications and those within the sample, aggregating the weights of nodes based on similarity scores and semantic attention coefficients, thereby enabling rapid detection. Lastly, considering the high time and space complexity of calculating node similarity scores on large graphs, we design a <em>NeuSim</em> model based on an encoder–decoder structure. The encoder module embeds each path instance as a vector, while the decoder converts the vector into a scalar similarity score, significantly reducing the complexity of the calculation. Experiments demonstrate that this model can not only rapidly detect malware but also capture high-level semantic relationships of application software in complex malware networks by hierarchically aggregating information from neighbors and meta-paths of different orders. Moreover, this model achieved an AUC of 0.9356 and an F1 score of 0.9355, surpassing existing malware detection algorithms. Particularly in the detection of unknown application software, the <em>NeuSim</em> model can double the detection speed, with an average detection time of 105 ms.</p></div>\",\"PeriodicalId\":50637,\"journal\":{\"name\":\"Computer Networks\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1389128624006315\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389128624006315","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
A fast malware detection model based on heterogeneous graph similarity search
The Android operating system has long been vulnerable to malicious software. Existing malware detection methods often fail to identify ever-evolving malware and are slow in detection. To address this, we propose a new model for rapid Android malware detection, which constructs various Android entities and relationships into a heterogeneous graph. Firstly, to address the semantic fusion problem in high-order heterogeneous graphs that arises with the increase in the depth of the heterogeneous graph model, we introduce adaptive weights during node aggregation to absorb the local semantics of nodes. This allows more attention to be paid to the feature information of the node itself during the semantic aggregation stage, thereby avoiding semantic confusion. Secondly, to mitigate the high time costs associated with detecting unknown applications, we employ an incremental similarity search model. This model quickly measures the similarity between unknown applications and those within the sample, aggregating the weights of nodes based on similarity scores and semantic attention coefficients, thereby enabling rapid detection. Lastly, considering the high time and space complexity of calculating node similarity scores on large graphs, we design a NeuSim model based on an encoder–decoder structure. The encoder module embeds each path instance as a vector, while the decoder converts the vector into a scalar similarity score, significantly reducing the complexity of the calculation. Experiments demonstrate that this model can not only rapidly detect malware but also capture high-level semantic relationships of application software in complex malware networks by hierarchically aggregating information from neighbors and meta-paths of different orders. Moreover, this model achieved an AUC of 0.9356 and an F1 score of 0.9355, surpassing existing malware detection algorithms. Particularly in the detection of unknown application software, the NeuSim model can double the detection speed, with an average detection time of 105 ms.
期刊介绍:
Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors. The Editorial Board will consider any material for publication that is of interest to those groups.