DC-GAR: detecting vulnerabilities by utilizing graph properties and random walks to uncover richer features

IF 3.1 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering Pub Date : 2025-06-14 DOI:10.1007/s10515-025-00532-6

Meng Wang, Xiao Han, Hong Zhang, Yiran Guo, Jiangfan Guo

{"title":"DC-GAR: detecting vulnerabilities by utilizing graph properties and random walks to uncover richer features","authors":"Meng Wang, Xiao Han, Hong Zhang, Yiran Guo, Jiangfan Guo","doi":"10.1007/s10515-025-00532-6","DOIUrl":null,"url":null,"abstract":"<div>Deep learning has become prominent in source code vulnerability detection due to its ability to automatically extract complex feature representations from code, eliminating the need for manually defined rules or patterns. Some methods treat code as text sequences, however, they often overlook its inherent structural information. In contrast, graph-based approaches effectively capture structural relationships, but the sparseness and inconsistency of structures may lead to uneven feature vector extraction, which means that the model may not be able to adequately characterize important nodes or paths. To address this issue, we propose an approach called Dual-channel Graph Neural Network combining Graph properties and Random walks (DC-GAR). This approach integrates graph properties and random walks within a dual-channel graph neural network framework to enhance vulnerability detection. Specifically, graph properties capture global semantic features, while random walks provide context-dependent node structure information. The combination of these features is then leveraged by the dual-channel graph neural network for detection and classification. We have implemented DC-GAR and evaluated it on a dataset of 29,514 functions. Experimental results demonstrate that DC-GAR surpasses state-of-the-art vulnerability detectors, including FlawFinder, SySeVR, Devign, VulCNN, AMPLE, HardVD, CodeBERT, and GraphCodeBERT in terms of accuracy and F1-Score. Moreover, DC-GAR has proven effective and practical in real-world open-source projects.</div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00532-6","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Deep learning has become prominent in source code vulnerability detection due to its ability to automatically extract complex feature representations from code, eliminating the need for manually defined rules or patterns. Some methods treat code as text sequences, however, they often overlook its inherent structural information. In contrast, graph-based approaches effectively capture structural relationships, but the sparseness and inconsistency of structures may lead to uneven feature vector extraction, which means that the model may not be able to adequately characterize important nodes or paths. To address this issue, we propose an approach called Dual-channel Graph Neural Network combining Graph properties and Random walks (DC-GAR). This approach integrates graph properties and random walks within a dual-channel graph neural network framework to enhance vulnerability detection. Specifically, graph properties capture global semantic features, while random walks provide context-dependent node structure information. The combination of these features is then leveraged by the dual-channel graph neural network for detection and classification. We have implemented DC-GAR and evaluated it on a dataset of 29,514 functions. Experimental results demonstrate that DC-GAR surpasses state-of-the-art vulnerability detectors, including FlawFinder, SySeVR, Devign, VulCNN, AMPLE, HardVD, CodeBERT, and GraphCodeBERT in terms of accuracy and F1-Score. Moreover, DC-GAR has proven effective and practical in real-world open-source projects.

Abstract Image

查看原文本刊更多论文

DC-GAR：通过利用图形属性和随机漫步来发现更丰富的特征来检测漏洞

深度学习在源代码漏洞检测方面已经变得突出，因为它能够自动从代码中提取复杂的特征表示，从而消除了手动定义规则或模式的需要。一些方法将代码视为文本序列，然而，它们经常忽略其固有的结构信息。相比之下，基于图的方法可以有效地捕获结构关系，但结构的稀疏性和不一致性可能导致特征向量提取不均匀，这意味着模型可能无法充分表征重要节点或路径。为了解决这个问题，我们提出了一种称为双通道图神经网络结合图属性和随机漫步（DC-GAR）的方法。该方法在双通道图神经网络框架内集成了图属性和随机游走，增强了漏洞检测能力。具体来说，图属性捕获全局语义特征，而随机漫步提供与上下文相关的节点结构信息。然后，双通道图神经网络利用这些特征的组合进行检测和分类。我们已经实现了DC-GAR，并在包含29,514个函数的数据集上对其进行了评估。实验结果表明，DC-GAR在准确率和F1-Score方面超过了最先进的漏洞检测器，包括FlawFinder、SySeVR、Devign、VulCNN、AMPLE、HardVD、CodeBERT和GraphCodeBERT。此外，DC-GAR已经在现实世界的开源项目中被证明是有效和实用的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Automated Software Engineering 工程技术-计算机：软件工程

CiteScore

4.80

自引率

11.80%

发文量

审稿时长

>12 weeks

期刊介绍： This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes. Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.