Vulnerability detection with graph enhancement and global dependency representation learning

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering Pub Date : 2025-01-05 DOI:10.1007/s10515-024-00484-3

Xuehai Jia, Junwei Du, Minying Fang, Hao Liu, Yuying Li, Feng Jiang

{"title":"Vulnerability detection with graph enhancement and global dependency representation learning","authors":"Xuehai Jia, Junwei Du, Minying Fang, Hao Liu, Yuying Li, Feng Jiang","doi":"10.1007/s10515-024-00484-3","DOIUrl":null,"url":null,"abstract":"<div><p>Vulnerability detection is essential for protecting software systems from attacks. Graph neural networks (GNNs) have proven effective in capturing semantic features of code and are widely used for this purpose. Existing GNN-based methods typically merge multiple graphs and employ GNNs to learn syntactic and semantic relationships within code graph structures. However, these methods face a significant limitation: current code graph structures inadequately represent parameter dependencies and node type information, which are crucial for capturing vulnerability patterns. This inadequacy hampers the GNNs’ ability to discern and characterize vulnerable code, thereby undermining effective vulnerability detection. Additionally, traditional GNN-based methods may lose long-distance dependency information during aggregation, which is vital for understanding the behavior and occurrence patterns of vulnerable code. Despite achieving state-of-the-art performance, existing GNN-based methods struggle to fully understand vulnerability behaviors and their potential impacts. To address these issues, this paper introduces VulDecgre, a novel vulnerability detection model comprising two components: (1) An enhanced code graph structure that fuses multiple graphs and relational edges to improve code representation. (2) A natural sequence-aware learning module that integrates code execution sequence information to enhance vulnerability detection. Extensive experiments on three public datasets and a self-collected large-scale real-world C/C++ dataset demonstrate that VulDecgre achieves superior performance in vulnerability detection.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-024-00484-3","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Vulnerability detection is essential for protecting software systems from attacks. Graph neural networks (GNNs) have proven effective in capturing semantic features of code and are widely used for this purpose. Existing GNN-based methods typically merge multiple graphs and employ GNNs to learn syntactic and semantic relationships within code graph structures. However, these methods face a significant limitation: current code graph structures inadequately represent parameter dependencies and node type information, which are crucial for capturing vulnerability patterns. This inadequacy hampers the GNNs’ ability to discern and characterize vulnerable code, thereby undermining effective vulnerability detection. Additionally, traditional GNN-based methods may lose long-distance dependency information during aggregation, which is vital for understanding the behavior and occurrence patterns of vulnerable code. Despite achieving state-of-the-art performance, existing GNN-based methods struggle to fully understand vulnerability behaviors and their potential impacts. To address these issues, this paper introduces VulDecgre, a novel vulnerability detection model comprising two components: (1) An enhanced code graph structure that fuses multiple graphs and relational edges to improve code representation. (2) A natural sequence-aware learning module that integrates code execution sequence information to enhance vulnerability detection. Extensive experiments on three public datasets and a self-collected large-scale real-world C/C++ dataset demonstrate that VulDecgre achieves superior performance in vulnerability detection.

查看原文本刊更多论文

基于图增强和全局依赖表示学习的漏洞检测

漏洞检测对于保护软件系统免受攻击至关重要。图神经网络（gnn）已被证明在捕获代码的语义特征方面是有效的，并被广泛用于这一目的。现有的基于gnn的方法通常是合并多个图，并使用gnn来学习代码图结构中的语法和语义关系。然而，这些方法面临着一个明显的限制：当前的代码图结构不能充分表示参数依赖关系和节点类型信息，而这些信息对于捕获漏洞模式至关重要。这种不足阻碍了gnn识别和表征易受攻击代码的能力，从而破坏了有效的漏洞检测。此外，传统的基于gnn的方法在聚合过程中可能会丢失远程依赖信息，这对于理解脆弱代码的行为和发生模式至关重要。尽管实现了最先进的性能，但现有的基于gnn的方法难以充分理解漏洞行为及其潜在影响。为了解决这些问题，本文引入了一种新的漏洞检测模型VulDecgre，该模型由两个部分组成：(1)一种增强的代码图结构，融合了多个图和关系边，以改善代码表示。(2)自然序列感知学习模块，集成代码执行序列信息，增强漏洞检测能力。在三个公共数据集和一个自收集的大规模真实C/ c++数据集上进行的大量实验表明，VulDecgre在漏洞检测方面取得了优异的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Automated Software Engineering 工程技术-计算机：软件工程

CiteScore

4.80

自引率

11.80%

发文量

审稿时长

>12 weeks

期刊介绍： This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes. Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.