Tensor-based gated graph neural network for automatic vulnerability detection in source code

Jia Yang, Ou Ruan, JiXin Zhang
{"title":"Tensor-based gated graph neural network for automatic vulnerability detection in source code","authors":"Jia Yang, Ou Ruan, JiXin Zhang","doi":"10.1002/stvr.1867","DOIUrl":null,"url":null,"abstract":"The rapid expansion of smart devices leads to the increasing demand for vulnerability detection in the cyber security field. Writing secure source codes is crucial to protect applications and software. Recent vulnerability detection methods are mainly using machine learning and deep learning. However, there are still some challenges, how to learn accurate source code semantic embedding at the function level, how to effectively perform vulnerability detection using the learned semantic embedding of source code and how to solve the overfitting problem of learning-based models. In this paper, we consider codes as various graphs with node features and propose a tensor-based gated graph neural network called TensorGNN to produce code embedding for function-level vulnerability detection. First, we propose a high-dimensional tensor for combining different code graph representations. Second, inspired by the work of tensor technology, we propose the TensorGNN model to produce accurate code representations using the graph tensor. We evaluate our model on 7 C and C++ large open-source code corpus (e.g. SARD&NVD, Debian, SATE IV, FFmpeg, libpng&LibTiff, Wireshark and Github datasets), which contains 13 types of vulnerabilities. Our TensorGNN model improves on existing state-of-the-art works by 10%–30% on average in terms of vulnerability detection accuracy and F1, while our TensorGNN model needs less training time and model parameters. Specifically, compared with other existing works, our model reduces 25–47 times of the number of parameters and decreases 3–10 times of training time. Results of evaluations show that TensorGNN has better performance while using fewer training parameters and less training time.","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software Testing, Verification and Reliability","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/stvr.1867","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The rapid expansion of smart devices leads to the increasing demand for vulnerability detection in the cyber security field. Writing secure source codes is crucial to protect applications and software. Recent vulnerability detection methods are mainly using machine learning and deep learning. However, there are still some challenges, how to learn accurate source code semantic embedding at the function level, how to effectively perform vulnerability detection using the learned semantic embedding of source code and how to solve the overfitting problem of learning-based models. In this paper, we consider codes as various graphs with node features and propose a tensor-based gated graph neural network called TensorGNN to produce code embedding for function-level vulnerability detection. First, we propose a high-dimensional tensor for combining different code graph representations. Second, inspired by the work of tensor technology, we propose the TensorGNN model to produce accurate code representations using the graph tensor. We evaluate our model on 7 C and C++ large open-source code corpus (e.g. SARD&NVD, Debian, SATE IV, FFmpeg, libpng&LibTiff, Wireshark and Github datasets), which contains 13 types of vulnerabilities. Our TensorGNN model improves on existing state-of-the-art works by 10%–30% on average in terms of vulnerability detection accuracy and F1, while our TensorGNN model needs less training time and model parameters. Specifically, compared with other existing works, our model reduces 25–47 times of the number of parameters and decreases 3–10 times of training time. Results of evaluations show that TensorGNN has better performance while using fewer training parameters and less training time.

Abstract Image

基于张量的门控图神经网络漏洞自动检测的源代码
智能设备的快速扩张导致网络安全领域对漏洞检测的需求不断增加。编写安全的源代码对于保护应用程序和软件至关重要。最近的漏洞检测方法主要是利用机器学习和深度学习。然而,如何在功能层学习准确的源代码语义嵌入,如何利用学习到的源代码语义嵌入有效地进行漏洞检测,如何解决基于学习的模型的过拟合问题,仍然存在一些挑战。本文将代码视为具有节点特征的各种图,提出了一种基于张量的门控图神经网络TensorGNN,用于生成用于函数级漏洞检测的代码嵌入。首先,我们提出了一个高维张量来组合不同的代码图表示。其次,受张量技术的启发,我们提出了TensorGNN模型来使用图张量生成准确的代码表示。我们在7个C和c++大型开源代码语料库(例如sardnvd、Debian、SATE IV、FFmpeg、libpnglibtiff、Wireshark和Github数据集)上评估了我们的模型,其中包含13种类型的漏洞。我们的TensorGNN模型在漏洞检测精度和F1方面比现有的先进成果平均提高了10%-30%,同时我们的TensorGNN模型需要更少的训练时间和模型参数。具体来说,与其他已有作品相比,我们的模型减少了25-47倍的参数数量,减少了3-10倍的训练时间。评价结果表明,使用更少的训练参数和更少的训练时间,TensorGNN具有更好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信