Hunting bugs with accelerated optimal graph vertex matching

Xiaohui Zhang, Yuanjun Gong, Bin Liang, Jianjun Huang, Wei You, Wenchang Shi, Jian Zhang
{"title":"Hunting bugs with accelerated optimal graph vertex matching","authors":"Xiaohui Zhang, Yuanjun Gong, Bin Liang, Jianjun Huang, Wei You, Wenchang Shi, Jian Zhang","doi":"10.1145/3533767.3534393","DOIUrl":null,"url":null,"abstract":"Various techniques based on code similarity measurement have been proposed to detect bugs. Essentially, the code fragment can be regarded as a kind of graph. Performing code graph similarity comparison to identify the potential bugs is a natural choice. However, the logic of a bug often involves only a few statements in the code fragment, while others are bug-irrelevant. They can be considered as a kind of noise, and can heavily interfere with the code similarity measurement. In theory, performing optimal vertex matching can address the problem well, but the task is NP-complete and cannot be applied to a large-scale code base. In this paper, we propose a two-phase strategy to accelerate code graph vertex matching for detecting bugs. In the first phase, a vertex matching embedding model is trained and used to rapidly filter a limited number of candidate code graphs from the target code base, which are likely to have a high vertex matching degree with the seed, i.e., the known buggy code. As a result, the number of code graphs needed to be further analyzed is dramatically reduced. In the second phase, a high-order similarity embedding model based on graph convolutional neural network is built to efficiently get the approximately optimal vertex matching between the seed and candidates. On this basis, the code graph similarity is calculated to identify the potential buggy code. The proposed method is applied to five open source projects. In total, 31 unknown bugs were successfully detected and confirmed by developers. Comparative experiments demonstrate that our method can effectively mitigate the noise problem, and the detection efficiency can be improved dozens of times with the two-phase strategy.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"199 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3533767.3534393","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Various techniques based on code similarity measurement have been proposed to detect bugs. Essentially, the code fragment can be regarded as a kind of graph. Performing code graph similarity comparison to identify the potential bugs is a natural choice. However, the logic of a bug often involves only a few statements in the code fragment, while others are bug-irrelevant. They can be considered as a kind of noise, and can heavily interfere with the code similarity measurement. In theory, performing optimal vertex matching can address the problem well, but the task is NP-complete and cannot be applied to a large-scale code base. In this paper, we propose a two-phase strategy to accelerate code graph vertex matching for detecting bugs. In the first phase, a vertex matching embedding model is trained and used to rapidly filter a limited number of candidate code graphs from the target code base, which are likely to have a high vertex matching degree with the seed, i.e., the known buggy code. As a result, the number of code graphs needed to be further analyzed is dramatically reduced. In the second phase, a high-order similarity embedding model based on graph convolutional neural network is built to efficiently get the approximately optimal vertex matching between the seed and candidates. On this basis, the code graph similarity is calculated to identify the potential buggy code. The proposed method is applied to five open source projects. In total, 31 unknown bugs were successfully detected and confirmed by developers. Comparative experiments demonstrate that our method can effectively mitigate the noise problem, and the detection efficiency can be improved dozens of times with the two-phase strategy.
用加速最优图顶点匹配寻找bug
人们提出了各种基于代码相似度度量的技术来检测bug。从本质上讲,代码片段可以看作是一种图。执行代码图相似性比较来识别潜在的bug是一种自然的选择。然而,错误的逻辑通常只涉及代码片段中的几个语句,而其他语句与错误无关。它们可以看作是一种噪声,严重干扰代码相似度的测量。理论上,执行最优顶点匹配可以很好地解决这个问题,但这个任务是np完备的,不能应用于大规模的代码库。在本文中,我们提出了一种两阶段策略来加速代码图顶点匹配以检测bug。在第一阶段,训练一个顶点匹配嵌入模型,从目标代码库中快速筛选有限数量的候选代码图,这些候选代码图可能与种子(即已知的bug代码)具有较高的顶点匹配度。因此,需要进一步分析的代码图的数量大大减少了。第二阶段,建立基于图卷积神经网络的高阶相似度嵌入模型,有效地获得种子与候选点之间的近似最优顶点匹配;在此基础上,计算代码图相似度来识别潜在的bug代码。将该方法应用于五个开源项目。总共有31个未知bug被开发人员成功检测并确认。对比实验表明,该方法可以有效地缓解噪声问题,并且两相策略的检测效率可以提高数十倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信