基于张量的门控图神经网络漏洞自动检测的源代码

Software Testing, Verification and Reliability Pub Date : 2023-11-27 DOI:10.1002/stvr.1867

Jia Yang, Ou Ruan, JiXin Zhang

{"title":"基于张量的门控图神经网络漏洞自动检测的源代码","authors":"Jia Yang, Ou Ruan, JiXin Zhang","doi":"10.1002/stvr.1867","DOIUrl":null,"url":null,"abstract":"The rapid expansion of smart devices leads to the increasing demand for vulnerability detection in the cyber security field. Writing secure source codes is crucial to protect applications and software. Recent vulnerability detection methods are mainly using machine learning and deep learning. However, there are still some challenges, how to learn accurate source code semantic embedding at the function level, how to effectively perform vulnerability detection using the learned semantic embedding of source code and how to solve the overfitting problem of learning-based models. In this paper, we consider codes as various graphs with node features and propose a tensor-based gated graph neural network called TensorGNN to produce code embedding for function-level vulnerability detection. First, we propose a high-dimensional tensor for combining different code graph representations. Second, inspired by the work of tensor technology, we propose the TensorGNN model to produce accurate code representations using the graph tensor. We evaluate our model on 7 C and C++ large open-source code corpus (e.g. SARD&NVD, Debian, SATE IV, FFmpeg, libpng&LibTiff, Wireshark and Github datasets), which contains 13 types of vulnerabilities. Our TensorGNN model improves on existing state-of-the-art works by 10%–30% on average in terms of vulnerability detection accuracy and F1, while our TensorGNN model needs less training time and model parameters. Specifically, compared with other existing works, our model reduces 25–47 times of the number of parameters and decreases 3–10 times of training time. Results of evaluations show that TensorGNN has better performance while using fewer training parameters and less training time.","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Tensor-based gated graph neural network for automatic vulnerability detection in source code\",\"authors\":\"Jia Yang, Ou Ruan, JiXin Zhang\",\"doi\":\"10.1002/stvr.1867\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rapid expansion of smart devices leads to the increasing demand for vulnerability detection in the cyber security field. Writing secure source codes is crucial to protect applications and software. Recent vulnerability detection methods are mainly using machine learning and deep learning. However, there are still some challenges, how to learn accurate source code semantic embedding at the function level, how to effectively perform vulnerability detection using the learned semantic embedding of source code and how to solve the overfitting problem of learning-based models. In this paper, we consider codes as various graphs with node features and propose a tensor-based gated graph neural network called TensorGNN to produce code embedding for function-level vulnerability detection. First, we propose a high-dimensional tensor for combining different code graph representations. Second, inspired by the work of tensor technology, we propose the TensorGNN model to produce accurate code representations using the graph tensor. We evaluate our model on 7 C and C++ large open-source code corpus (e.g. SARD&NVD, Debian, SATE IV, FFmpeg, libpng&LibTiff, Wireshark and Github datasets), which contains 13 types of vulnerabilities. Our TensorGNN model improves on existing state-of-the-art works by 10%–30% on average in terms of vulnerability detection accuracy and F1, while our TensorGNN model needs less training time and model parameters. Specifically, compared with other existing works, our model reduces 25–47 times of the number of parameters and decreases 3–10 times of training time. Results of evaluations show that TensorGNN has better performance while using fewer training parameters and less training time.\",\"PeriodicalId\":501413,\"journal\":{\"name\":\"Software Testing, Verification and Reliability\",\"volume\":\"6 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Software Testing, Verification and Reliability\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/stvr.1867\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software Testing, Verification and Reliability","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/stvr.1867","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

智能设备的快速扩张导致网络安全领域对漏洞检测的需求不断增加。编写安全的源代码对于保护应用程序和软件至关重要。最近的漏洞检测方法主要是利用机器学习和深度学习。然而，如何在功能层学习准确的源代码语义嵌入，如何利用学习到的源代码语义嵌入有效地进行漏洞检测，如何解决基于学习的模型的过拟合问题，仍然存在一些挑战。本文将代码视为具有节点特征的各种图，提出了一种基于张量的门控图神经网络TensorGNN，用于生成用于函数级漏洞检测的代码嵌入。首先，我们提出了一个高维张量来组合不同的代码图表示。其次，受张量技术的启发，我们提出了TensorGNN模型来使用图张量生成准确的代码表示。我们在7个C和c++大型开源代码语料库(例如sardnvd、Debian、SATE IV、FFmpeg、libpnglibtiff、Wireshark和Github数据集)上评估了我们的模型，其中包含13种类型的漏洞。我们的TensorGNN模型在漏洞检测精度和F1方面比现有的先进成果平均提高了10%-30%，同时我们的TensorGNN模型需要更少的训练时间和模型参数。具体来说，与其他已有作品相比，我们的模型减少了25-47倍的参数数量，减少了3-10倍的训练时间。评价结果表明，使用更少的训练参数和更少的训练时间，TensorGNN具有更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Tensor-based gated graph neural network for automatic vulnerability detection in source code

查看原文本刊更多论文

Tensor-based gated graph neural network for automatic vulnerability detection in source code

The rapid expansion of smart devices leads to the increasing demand for vulnerability detection in the cyber security field. Writing secure source codes is crucial to protect applications and software. Recent vulnerability detection methods are mainly using machine learning and deep learning. However, there are still some challenges, how to learn accurate source code semantic embedding at the function level, how to effectively perform vulnerability detection using the learned semantic embedding of source code and how to solve the overfitting problem of learning-based models. In this paper, we consider codes as various graphs with node features and propose a tensor-based gated graph neural network called TensorGNN to produce code embedding for function-level vulnerability detection. First, we propose a high-dimensional tensor for combining different code graph representations. Second, inspired by the work of tensor technology, we propose the TensorGNN model to produce accurate code representations using the graph tensor. We evaluate our model on 7 C and C++ large open-source code corpus (e.g. SARD&NVD, Debian, SATE IV, FFmpeg, libpng&LibTiff, Wireshark and Github datasets), which contains 13 types of vulnerabilities. Our TensorGNN model improves on existing state-of-the-art works by 10%–30% on average in terms of vulnerability detection accuracy and F1, while our TensorGNN model needs less training time and model parameters. Specifically, compared with other existing works, our model reduces 25–47 times of the number of parameters and decreases 3–10 times of training time. Results of evaluations show that TensorGNN has better performance while using fewer training parameters and less training time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Software Testing, Verification and Reliability

自引率

0.00%

发文量