IntJect: Vulnerability Intent Bug Seeding

2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS) Pub Date : 2022-12-01 DOI:10.1109/QRS57517.2022.00013

Benjamin Petit, Ahmed Khanfir, E. Soremekun, Gilles Perrouin, Mike Papadakis

{"title":"IntJect: Vulnerability Intent Bug Seeding","authors":"Benjamin Petit, Ahmed Khanfir, E. Soremekun, Gilles Perrouin, Mike Papadakis","doi":"10.1109/QRS57517.2022.00013","DOIUrl":null,"url":null,"abstract":"Studying and exposing software vulnerabilities is important to ensure software security, safety, and reliability. Software engineers often inject vulnerabilities into their programs to test the reliability of their test suites, vulnerability detectors, and security measures. However, state-of-the-art vulnerability injection methods only capture code syntax/patterns, they do not learn the intent of the vulnerability and are limited to the syntax of the original dataset. To address this challenge, we propose the first intent-based vulnerability injection method that learns both the program syntax and vulnerability intent. Our approach applies a combination of NLP methods and semantic-preserving program mutations (at the bytecode level) to inject code vulnerabilities. Given a dataset of known vulnerabilities (containing benign and vulnerable code pairs), our approach proceeds by employing semantic-preserving program mutations to transform the existing dataset to semantically similar code. Then, it learns the intent of the vulnerability via neural machine translation (Seq2Seq) models. The key insight is to employ Seq2Seq to learn the intent (context) of the vulnerable code in a manner that is agnostic of the specific program instance. We evaluate the performance of our approach using 1275 vulnerabilities belonging to five (5) CWEs from the Juliet test suite. We examine the effectiveness of our approach in producing compilable and vulnerable code. Our results show that IntJECT is effective, almost all (99%) of the code produced by our approach is vulnerable and compilable. We also demonstrate that the vulnerable programs generated by IntJECT are semantically similar to the withheld original vulnerable code. Finally, we show that our mutation-based data transformation approach outperforms its alternatives, namely data obfuscation and using the original data.","PeriodicalId":143812,"journal":{"name":"2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QRS57517.2022.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Studying and exposing software vulnerabilities is important to ensure software security, safety, and reliability. Software engineers often inject vulnerabilities into their programs to test the reliability of their test suites, vulnerability detectors, and security measures. However, state-of-the-art vulnerability injection methods only capture code syntax/patterns, they do not learn the intent of the vulnerability and are limited to the syntax of the original dataset. To address this challenge, we propose the first intent-based vulnerability injection method that learns both the program syntax and vulnerability intent. Our approach applies a combination of NLP methods and semantic-preserving program mutations (at the bytecode level) to inject code vulnerabilities. Given a dataset of known vulnerabilities (containing benign and vulnerable code pairs), our approach proceeds by employing semantic-preserving program mutations to transform the existing dataset to semantically similar code. Then, it learns the intent of the vulnerability via neural machine translation (Seq2Seq) models. The key insight is to employ Seq2Seq to learn the intent (context) of the vulnerable code in a manner that is agnostic of the specific program instance. We evaluate the performance of our approach using 1275 vulnerabilities belonging to five (5) CWEs from the Juliet test suite. We examine the effectiveness of our approach in producing compilable and vulnerable code. Our results show that IntJECT is effective, almost all (99%) of the code produced by our approach is vulnerable and compilable. We also demonstrate that the vulnerable programs generated by IntJECT are semantically similar to the withheld original vulnerable code. Finally, we show that our mutation-based data transformation approach outperforms its alternatives, namely data obfuscation and using the original data.

查看原文本刊更多论文

IntJect:漏洞意图Bug播种

研究和暴露软件漏洞对于确保软件的安全性、安全性和可靠性非常重要。软件工程师经常将漏洞注入到他们的程序中，以测试他们的测试套件、漏洞检测器和安全措施的可靠性。然而，最先进的漏洞注入方法只捕获代码语法/模式，它们不了解漏洞的意图，并且仅限于原始数据集的语法。为了解决这一挑战，我们提出了第一种基于意图的漏洞注入方法，该方法可以同时学习程序语法和漏洞意图。我们的方法结合了NLP方法和保持语义的程序突变(在字节码级别)来注入代码漏洞。给定已知漏洞的数据集(包含良性和易受攻击的代码对)，我们的方法通过使用语义保留程序突变将现有数据集转换为语义相似的代码来进行。然后，它通过神经机器翻译(Seq2Seq)模型学习漏洞的意图。关键的洞察力是使用Seq2Seq以一种与特定程序实例无关的方式来了解易受攻击代码的意图(上下文)。我们使用朱丽叶测试套件中的五(5)个CWEs中的1275个漏洞来评估我们方法的性能。我们检查了我们的方法在生成可编译和易受攻击的代码方面的有效性。我们的结果表明，IntJECT是有效的，几乎所有(99%)由我们的方法产生的代码是脆弱的和可编译的。我们还证明了由IntJECT生成的易受攻击的程序在语义上与保留的原始易受攻击代码相似。最后，我们证明了基于突变的数据转换方法优于其替代方法，即数据混淆和使用原始数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)

自引率

0.00%

发文量