iGnnVD: A novel software vulnerability detection model based on integrated graph neural networks

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming Pub Date : 2024-06-06 DOI:10.1016/j.scico.2024.103156

Jinfu Chen , Yemin Yin , Saihua Cai , Weijia Wang , Shengran Wang , Jiming Chen

{"title":"iGnnVD: A novel software vulnerability detection model based on integrated graph neural networks","authors":"Jinfu Chen , Yemin Yin , Saihua Cai , Weijia Wang , Shengran Wang , Jiming Chen","doi":"10.1016/j.scico.2024.103156","DOIUrl":null,"url":null,"abstract":"<div><p>Software vulnerability detection is a challenging task in the security field, the boom of deep learning technology promotes the development of automatic vulnerability detection. Compared with sequence-based deep learning models, graph neural network (GNN) can learn the structural features of code, it performs well in the field of vulnerability detection for source code. However, different GNNs have different detection results for the same code, and using a single kind of GNN may lead to high false positive rate and false negative rate. In addition, the complex structure of source code causes single GNN model cannot effectively learn their depth feature, thereby leading to low detection accuracy. To solve these limitations, we propose a software vulnerability detection model called iGnnVD based on the integrated graph neural networks. In the proposed iGnnVD model, the base detectors including GCN, GAT and APPNP are first constructed to capture the bidirectional information in the code graph structure with bidirectional structure; And then, the residual connection is used to aggregate the features while retaining the features each time; Finally, the convolutional layer is used to perform the aggregated classification. In addition, an integration module that analyzes the detection results of three detectors for final classification is designed using a voting strategy to solve the problem of high false positive rate and false negative rate caused by using a single kind of base detector. We perform extensive experiments on three datasets and experimental results show that the proposed iGnnVD model can improve the detection accuracy of vulnerabilities in source code as well as reduce the false positive rate and false negative rate compared with existing deep learning-based vulnerability detection models, it also has good stability.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"238 ","pages":"Article 103156"},"PeriodicalIF":1.5000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science of Computer Programming","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167642324000790","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Software vulnerability detection is a challenging task in the security field, the boom of deep learning technology promotes the development of automatic vulnerability detection. Compared with sequence-based deep learning models, graph neural network (GNN) can learn the structural features of code, it performs well in the field of vulnerability detection for source code. However, different GNNs have different detection results for the same code, and using a single kind of GNN may lead to high false positive rate and false negative rate. In addition, the complex structure of source code causes single GNN model cannot effectively learn their depth feature, thereby leading to low detection accuracy. To solve these limitations, we propose a software vulnerability detection model called iGnnVD based on the integrated graph neural networks. In the proposed iGnnVD model, the base detectors including GCN, GAT and APPNP are first constructed to capture the bidirectional information in the code graph structure with bidirectional structure; And then, the residual connection is used to aggregate the features while retaining the features each time; Finally, the convolutional layer is used to perform the aggregated classification. In addition, an integration module that analyzes the detection results of three detectors for final classification is designed using a voting strategy to solve the problem of high false positive rate and false negative rate caused by using a single kind of base detector. We perform extensive experiments on three datasets and experimental results show that the proposed iGnnVD model can improve the detection accuracy of vulnerabilities in source code as well as reduce the false positive rate and false negative rate compared with existing deep learning-based vulnerability detection models, it also has good stability.

查看原文本刊更多论文

iGnnVD：基于集成图神经网络的新型软件漏洞检测模型

软件漏洞检测是安全领域一项极具挑战性的任务，深度学习技术的蓬勃发展推动了漏洞自动检测的发展。与基于序列的深度学习模型相比，图神经网络（GNN）可以学习代码的结构特征，在源代码漏洞检测领域表现出色。然而，不同的图神经网络对相同代码的检测结果不同，使用单一类型的图神经网络可能会导致较高的假阳性率和假阴性率。此外，源代码结构复杂，单一的 GNN 模型无法有效学习其深度特征，从而导致检测准确率较低。为了解决这些问题，我们提出了一种基于集成图神经网络的软件漏洞检测模型 iGnnVD。在所提出的 iGnnVD 模型中，首先构建了包括 GCN、GAT 和 APPNP 在内的基础检测器，以捕捉具有双向结构的代码图结构中的双向信息；然后，在保留每次特征的同时，利用残差连接对特征进行聚合；最后，利用卷积层进行聚合分类。此外，我们还设计了一个集成模块，利用投票策略分析三个检测器的检测结果，进行最终分类，以解决使用单一类型的基础检测器导致的高假阳性率和假阴性率问题。我们在三个数据集上进行了大量实验，实验结果表明，与现有的基于深度学习的漏洞检测模型相比，所提出的 iGnnVD 模型可以提高源代码中漏洞的检测精度，降低误报率和误负率，而且具有良好的稳定性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Science of Computer Programming 工程技术-计算机：软件工程

CiteScore

3.80

自引率

0.00%

发文量

审稿时长

67 days

期刊介绍： Science of Computer Programming is dedicated to the distribution of research results in the areas of software systems development, use and maintenance, including the software aspects of hardware design. The journal has a wide scope ranging from the many facets of methodological foundations to the details of technical issues andthe aspects of industrial practice. The subjects of interest to SCP cover the entire spectrum of methods for the entire life cycle of software systems, including • Requirements, specification, design, validation, verification, coding, testing, maintenance, metrics and renovation of software; • Design, implementation and evaluation of programming languages; • Programming environments, development tools, visualisation and animation; • Management of the development process; • Human factors in software, software for social interaction, software for social computing; • Cyber physical systems, and software for the interaction between the physical and the machine; • Software aspects of infrastructure services, system administration, and network management.