(Partial) Program Dependence Learning

2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) Pub Date : 2023-05-01 DOI:10.1109/ICSE48619.2023.00209

Aashish Yadavally, T. Nguyen, Wenbo Wang, Shaohua Wang

{"title":"(Partial) Program Dependence Learning","authors":"Aashish Yadavally, T. Nguyen, Wenbo Wang, Shaohua Wang","doi":"10.1109/ICSE48619.2023.00209","DOIUrl":null,"url":null,"abstract":"Code fragments from developer forums often migrate to applications due to the code reuse practice. Owing to the incomplete nature of such programs, analyzing them to early determine the presence of potential vulnerabilities is challenging. In this work, we introduce NeuralPDA, a neural network-based program dependence analysis tool for both complete and partial programs. Our tool efficiently incorporates intra-statement and inter-statement contextual features into statement representations, thereby modeling program dependence analysis as a statement-pair dependence decoding task. In the empirical evaluation, we report that NeuralPDA predicts the CFG and PDG edges in complete Java and C/C++ code with combined F-scores of 94.29% and 92.46%, respectively. The F-score values for partial Java and C/C++ code range from 94.29%-97.17% and 92.46%-96.01%, respectively. We also test the usefulness of the PDGs predicted by NeuralPDA (i.e., PDG*) on the downstream task of method-level vulnerability detection. We discover that the performance of the vulnerability detection tool utilizing PDG* is only 1.1% less than that utilizing the PDGs generated by a program analysis tool. We also report the detection of 14 real-world vulnerable code snippets from StackOverflow by a machine learning-based vulnerability detection tool that employs the PDGs predicted by NeuralPDA for these code snippets.","PeriodicalId":376379,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE48619.2023.00209","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Code fragments from developer forums often migrate to applications due to the code reuse practice. Owing to the incomplete nature of such programs, analyzing them to early determine the presence of potential vulnerabilities is challenging. In this work, we introduce NeuralPDA, a neural network-based program dependence analysis tool for both complete and partial programs. Our tool efficiently incorporates intra-statement and inter-statement contextual features into statement representations, thereby modeling program dependence analysis as a statement-pair dependence decoding task. In the empirical evaluation, we report that NeuralPDA predicts the CFG and PDG edges in complete Java and C/C++ code with combined F-scores of 94.29% and 92.46%, respectively. The F-score values for partial Java and C/C++ code range from 94.29%-97.17% and 92.46%-96.01%, respectively. We also test the usefulness of the PDGs predicted by NeuralPDA (i.e., PDG*) on the downstream task of method-level vulnerability detection. We discover that the performance of the vulnerability detection tool utilizing PDG* is only 1.1% less than that utilizing the PDGs generated by a program analysis tool. We also report the detection of 14 real-world vulnerable code snippets from StackOverflow by a machine learning-based vulnerability detection tool that employs the PDGs predicted by NeuralPDA for these code snippets.

查看原文本刊更多论文

(部分)程序依赖学习

由于代码重用实践，来自开发人员论坛的代码片段经常迁移到应用程序中。由于此类程序的不完全性，分析它们以早期确定潜在漏洞的存在是具有挑战性的。在这项工作中，我们介绍了NeuralPDA，一个基于神经网络的程序依赖分析工具，用于完整和部分程序。我们的工具有效地将语句内和语句间的上下文特征整合到语句表示中，从而将程序依赖性分析建模为语句对依赖性解码任务。在实证评估中，我们报告了NeuralPDA在完整的Java和C/ c++代码中预测CFG和PDG边的f值分别为94.29%和92.46%。部分Java和C/ c++代码的f值分别为94.29%-97.17%和92.46%-96.01%。我们还测试了由NeuralPDA(即PDG*)预测的PDGs在方法级漏洞检测的下游任务中的有用性。我们发现使用PDG*的漏洞检测工具的性能仅比使用程序分析工具生成的PDG低1.1%。我们还报告了基于机器学习的漏洞检测工具从StackOverflow中检测出的14个真实世界的脆弱代码片段，该工具使用了NeuralPDA预测的这些代码片段的PDGs。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量