CodeSAGE: A multi-feature fusion vulnerability detection approach using code attribute graphs and attention mechanisms

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Information Security and Applications Pub Date : 2025-01-28 DOI:10.1016/j.jisa.2025.103973

Guodong Zhang , Tianyu Yao , Jiawei Qin , Yitao Li , Qiao Ma , Donghong Sun

{"title":"CodeSAGE: A multi-feature fusion vulnerability detection approach using code attribute graphs and attention mechanisms","authors":"Guodong Zhang , Tianyu Yao , Jiawei Qin , Yitao Li , Qiao Ma , Donghong Sun","doi":"10.1016/j.jisa.2025.103973","DOIUrl":null,"url":null,"abstract":"<div><div>Software supply chain security is a critical aspect of modern computer security, with vulnerabilities being a significant threats. Identifying and patching these vulnerabilities promptly can significantly reduce security risks. Traditional detection methods cannot fully capture the complex structure of source code, leading to low accuracy. The neural network capacity limits machine learning-based methods, hindering effective feature extraction and impacting performance. In this paper, we propose a multi-feature fusion vulnerability detection technique called CodeSAGE. The method utilizes the Code Property Graph (CPG)<span><span><sup>1</sup></span></span> to comprehensively display multiple logical structural relationships in the source code and combine it with GraphSAGE to aggregate the information of neighboring nodes in CPG to extract local features of the source code. Meanwhile, a Bi-LSTM combined with the attention mechanism is utilized to capture long-range dependencies in the logical structure of the source code and extract global features. The attention mechanism is used to assign weights to the two features, which are then fused to represent the syntactic and semantic information of the source code for vulnerability detection. A method for simplifying the CPG is proposed to mitigate the impact of graph size on model runtime and reduce redundant feature information. Irrelevant nodes are removed by weighting different edge types and filtering nodes exceeding a certain threshold, reducing the CPG size. To verify the effectiveness of CodeSAGE, comparative experiments are conducted on the SARD and CodeXGLUE datasets. The experimental results show that the CPG size can be reduced by 25%–45% using the simplified method, with an average time reduction of 20% per training round. Detection accuracy reached 99.12% on the SARD dataset and 73.57% on the CodeXGLUE dataset, outperforming the comparison methods.</div></div>","PeriodicalId":48638,"journal":{"name":"Journal of Information Security and Applications","volume":"89 ","pages":"Article 103973"},"PeriodicalIF":3.8000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Security and Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214212625000110","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Software supply chain security is a critical aspect of modern computer security, with vulnerabilities being a significant threats. Identifying and patching these vulnerabilities promptly can significantly reduce security risks. Traditional detection methods cannot fully capture the complex structure of source code, leading to low accuracy. The neural network capacity limits machine learning-based methods, hindering effective feature extraction and impacting performance. In this paper, we propose a multi-feature fusion vulnerability detection technique called CodeSAGE. The method utilizes the Code Property Graph (CPG)¹ to comprehensively display multiple logical structural relationships in the source code and combine it with GraphSAGE to aggregate the information of neighboring nodes in CPG to extract local features of the source code. Meanwhile, a Bi-LSTM combined with the attention mechanism is utilized to capture long-range dependencies in the logical structure of the source code and extract global features. The attention mechanism is used to assign weights to the two features, which are then fused to represent the syntactic and semantic information of the source code for vulnerability detection. A method for simplifying the CPG is proposed to mitigate the impact of graph size on model runtime and reduce redundant feature information. Irrelevant nodes are removed by weighting different edge types and filtering nodes exceeding a certain threshold, reducing the CPG size. To verify the effectiveness of CodeSAGE, comparative experiments are conducted on the SARD and CodeXGLUE datasets. The experimental results show that the CPG size can be reduced by 25%–45% using the simplified method, with an average time reduction of 20% per training round. Detection accuracy reached 99.12% on the SARD dataset and 73.57% on the CodeXGLUE dataset, outperforming the comparison methods.

查看原文本刊更多论文

CodeSAGE：一种使用代码属性图和注意机制的多特征融合漏洞检测方法

软件供应链安全是现代计算机安全的一个重要方面，漏洞是一个重大威胁。及时识别并修补这些漏洞可以显著降低安全风险。传统的检测方法不能完全捕捉到源代码的复杂结构，导致检测准确率较低。神经网络的容量限制了基于机器学习的方法，阻碍了有效的特征提取并影响了性能。本文提出了一种多特征融合漏洞检测技术——CodeSAGE。该方法利用代码属性图（Code Property Graph， CPG）1综合显示源代码中的多个逻辑结构关系，并结合GraphSAGE对CPG中相邻节点的信息进行聚合，提取源代码的局部特征。同时，利用Bi-LSTM结合注意机制捕获源代码逻辑结构中的远程依赖关系，提取全局特征。利用注意机制对这两个特征进行权重分配，然后将其融合以表示源代码的语法和语义信息，用于漏洞检测。提出了一种简化CPG的方法，以减轻图大小对模型运行时的影响，减少冗余特征信息。通过对不同边缘类型加权，过滤超过一定阈值的节点，去除不相关节点，减小CPG大小。为了验证CodeSAGE的有效性，在SARD和CodeXGLUE数据集上进行了对比实验。实验结果表明，采用该简化方法，CPG的大小可以减少25% ~ 45%，每轮训练平均减少20%的时间。在SARD数据集和CodeXGLUE数据集上的检测准确率分别达到99.12%和73.57%，优于对比方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Information Security and Applications Computer Science-Computer Networks and Communications

CiteScore

10.90

自引率

5.40%

发文量

206

审稿时长

56 days

期刊介绍： Journal of Information Security and Applications (JISA) focuses on the original research and practice-driven applications with relevance to information security and applications. JISA provides a common linkage between a vibrant scientific and research community and industry professionals by offering a clear view on modern problems and challenges in information security, as well as identifying promising scientific and "best-practice" solutions. JISA issues offer a balance between original research work and innovative industrial approaches by internationally renowned information security experts and researchers.