(Partial) Program Dependence Learning

Aashish Yadavally, T. Nguyen, Wenbo Wang, Shaohua Wang
{"title":"(Partial) Program Dependence Learning","authors":"Aashish Yadavally, T. Nguyen, Wenbo Wang, Shaohua Wang","doi":"10.1109/ICSE48619.2023.00209","DOIUrl":null,"url":null,"abstract":"Code fragments from developer forums often migrate to applications due to the code reuse practice. Owing to the incomplete nature of such programs, analyzing them to early determine the presence of potential vulnerabilities is challenging. In this work, we introduce NeuralPDA, a neural network-based program dependence analysis tool for both complete and partial programs. Our tool efficiently incorporates intra-statement and inter-statement contextual features into statement representations, thereby modeling program dependence analysis as a statement-pair dependence decoding task. In the empirical evaluation, we report that NeuralPDA predicts the CFG and PDG edges in complete Java and C/C++ code with combined F-scores of 94.29% and 92.46%, respectively. The F-score values for partial Java and C/C++ code range from 94.29%-97.17% and 92.46%-96.01%, respectively. We also test the usefulness of the PDGs predicted by NeuralPDA (i.e., PDG*) on the downstream task of method-level vulnerability detection. We discover that the performance of the vulnerability detection tool utilizing PDG* is only 1.1% less than that utilizing the PDGs generated by a program analysis tool. We also report the detection of 14 real-world vulnerable code snippets from StackOverflow by a machine learning-based vulnerability detection tool that employs the PDGs predicted by NeuralPDA for these code snippets.","PeriodicalId":376379,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE48619.2023.00209","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Code fragments from developer forums often migrate to applications due to the code reuse practice. Owing to the incomplete nature of such programs, analyzing them to early determine the presence of potential vulnerabilities is challenging. In this work, we introduce NeuralPDA, a neural network-based program dependence analysis tool for both complete and partial programs. Our tool efficiently incorporates intra-statement and inter-statement contextual features into statement representations, thereby modeling program dependence analysis as a statement-pair dependence decoding task. In the empirical evaluation, we report that NeuralPDA predicts the CFG and PDG edges in complete Java and C/C++ code with combined F-scores of 94.29% and 92.46%, respectively. The F-score values for partial Java and C/C++ code range from 94.29%-97.17% and 92.46%-96.01%, respectively. We also test the usefulness of the PDGs predicted by NeuralPDA (i.e., PDG*) on the downstream task of method-level vulnerability detection. We discover that the performance of the vulnerability detection tool utilizing PDG* is only 1.1% less than that utilizing the PDGs generated by a program analysis tool. We also report the detection of 14 real-world vulnerable code snippets from StackOverflow by a machine learning-based vulnerability detection tool that employs the PDGs predicted by NeuralPDA for these code snippets.
(部分)程序依赖学习
由于代码重用实践,来自开发人员论坛的代码片段经常迁移到应用程序中。由于此类程序的不完全性,分析它们以早期确定潜在漏洞的存在是具有挑战性的。在这项工作中,我们介绍了NeuralPDA,一个基于神经网络的程序依赖分析工具,用于完整和部分程序。我们的工具有效地将语句内和语句间的上下文特征整合到语句表示中,从而将程序依赖性分析建模为语句对依赖性解码任务。在实证评估中,我们报告了NeuralPDA在完整的Java和C/ c++代码中预测CFG和PDG边的f值分别为94.29%和92.46%。部分Java和C/ c++代码的f值分别为94.29%-97.17%和92.46%-96.01%。我们还测试了由NeuralPDA(即PDG*)预测的PDGs在方法级漏洞检测的下游任务中的有用性。我们发现使用PDG*的漏洞检测工具的性能仅比使用程序分析工具生成的PDG低1.1%。我们还报告了基于机器学习的漏洞检测工具从StackOverflow中检测出的14个真实世界的脆弱代码片段,该工具使用了NeuralPDA预测的这些代码片段的PDGs。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信