Vulmg: A Static Detection Solution For Source Code Vulnerabilities Based On Code Property Graph and Graph Attention Network

2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP) Pub Date : 2021-12-17 DOI:10.1109/ICCWAMTIP53232.2021.9674145

Zhang Haojie, Liao Yujun, Liu Yiwei, Zhou Nanxin

{"title":"Vulmg: A Static Detection Solution For Source Code Vulnerabilities Based On Code Property Graph and Graph Attention Network","authors":"Zhang Haojie, Liao Yujun, Liu Yiwei, Zhou Nanxin","doi":"10.1109/ICCWAMTIP53232.2021.9674145","DOIUrl":null,"url":null,"abstract":"As the number of vulnerabilities continues to rise, security incidents triggered by vulnerabilities emerge endlessly. Current vulnerability detection methods still have some problems, such as detecting only a single function, relying heavily on expert knowledge, and being unable to achieve automation. According to the observation of the Juliet dataset, we find vulnerability exists not only within the single function but also between the called function and the calling function. Meanwhile, there are some differences between vulnerable functions and non-vulnerable functions in the code property graph. Therefore, this article proposes a vulnerability detection solution named VULMG, which converts vulnerability detection into the graph classification problem. VULMG includes a vectorization component named VecG and a deep learning classification model named MGGAT. Based on the code property graph, VecG extracts the lexical, grammatical, and semantic information of the source code as a feature matrix and extracts information such as structure, control, and dependence as three adjacency matrices. MGGAT is a deep learning model based on the graph attention network, which is used for graph classification. Besides, VULMG uses the FCG to associate the calling function with the called function so that it can detect the cross-function vulnerabilities. We selected CWE369 and CWE476 from the Juliet dataset for testing, and the F1 scores were 94.43% and 96.3%. The evaluation results indicate that VULMG outperforms Flawfinder, RATS, BiLSTM, SVM, and GCN, which verifies the effectiveness of the proposed solution.","PeriodicalId":358772,"journal":{"name":"2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)","volume":"62 12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCWAMTIP53232.2021.9674145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

As the number of vulnerabilities continues to rise, security incidents triggered by vulnerabilities emerge endlessly. Current vulnerability detection methods still have some problems, such as detecting only a single function, relying heavily on expert knowledge, and being unable to achieve automation. According to the observation of the Juliet dataset, we find vulnerability exists not only within the single function but also between the called function and the calling function. Meanwhile, there are some differences between vulnerable functions and non-vulnerable functions in the code property graph. Therefore, this article proposes a vulnerability detection solution named VULMG, which converts vulnerability detection into the graph classification problem. VULMG includes a vectorization component named VecG and a deep learning classification model named MGGAT. Based on the code property graph, VecG extracts the lexical, grammatical, and semantic information of the source code as a feature matrix and extracts information such as structure, control, and dependence as three adjacency matrices. MGGAT is a deep learning model based on the graph attention network, which is used for graph classification. Besides, VULMG uses the FCG to associate the calling function with the called function so that it can detect the cross-function vulnerabilities. We selected CWE369 and CWE476 from the Juliet dataset for testing, and the F1 scores were 94.43% and 96.3%. The evaluation results indicate that VULMG outperforms Flawfinder, RATS, BiLSTM, SVM, and GCN, which verifies the effectiveness of the proposed solution.

查看原文本刊更多论文

基于代码属性图和图关注网络的源代码漏洞静态检测解决方案Vulmg

随着漏洞数量的不断增加，漏洞引发的安全事件层出不穷。目前的漏洞检测方法还存在检测功能单一、严重依赖专家知识、无法实现自动化等问题。通过对Juliet数据集的观察，我们发现漏洞不仅存在于单个函数内部，也存在于被调用函数与被调用函数之间。同时，代码属性图中脆弱函数与非脆弱函数存在一定的差异。因此，本文提出了一种名为VULMG的漏洞检测方案，将漏洞检测转化为图分类问题。VULMG包括一个矢量化组件VecG和一个深度学习分类模型MGGAT。基于代码属性图，VecG将源代码的词法、语法和语义信息提取为特征矩阵，将结构、控制、依赖等信息提取为三个邻接矩阵。MGGAT是一种基于图注意网络的深度学习模型，用于图的分类。此外，VULMG使用FCG将调用函数与被调用函数关联起来，以便检测跨功能漏洞。我们选择Juliet数据集中的CWE369和CWE476进行测试，F1得分分别为94.43%和96.3%。评估结果表明，VULMG优于缺陷查找器、RATS、BiLSTM、SVM和GCN，验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)

自引率

0.00%

发文量