On Comparing Mutation Testing Tools through Learning-based Mutant Selection

2023 IEEE/ACM International Conference on Automation of Software Test (AST) Pub Date : 2023-05-01 DOI:10.1109/AST58925.2023.00008

Miloš Ojdanić, Ahmed Khanfir, Aayush Garg, Renzo Degiovanni, Mike Papadakis, Y. Le Traon

{"title":"On Comparing Mutation Testing Tools through Learning-based Mutant Selection","authors":"Miloš Ojdanić, Ahmed Khanfir, Aayush Garg, Renzo Degiovanni, Mike Papadakis, Y. Le Traon","doi":"10.1109/AST58925.2023.00008","DOIUrl":null,"url":null,"abstract":"Recently many mutation testing tools have been proposed that rely on bug-fix patterns and natural language models trained on large code corpus. As these tools operate fundamentally differently from the grammar-based traditional approaches, a question arises of how these tools compare in terms of 1) fault detection and 2) cost-effectiveness. Simultaneously, mutation testing research proposes mutant selection approaches based on machine learning to mitigate its application cost. This raises another question: How do the existing mutation testing tools compare when guided by mutant selection approaches? To answer these questions, we compare four existing tools – μBERT (uses pre-trained language model for fault seeding), IBIR (relies on inverted fix-patterns), DeepMutation (generates mutants by employing Neural Machine Translation) and PIT (applies standard grammar-based rules) in terms of fault detection capability and cost-effectiveness, in conjunction with standard and deep learning based mutant selection strategies. Our results show that IBIR has the highest fault detection capability among the four tools; however, it is not the most cost-effective when considering different selection strategies. On the other hand, μBERT having a relatively lower fault detection capability, is the most cost-effective among the four tools. Our results also indicate that comparing mutation testing tools when using deep learning-based mutant selection strategies can lead to different conclusions than the standard mutant selection. For instance, our results demonstrate that combining μBERT with deep learning-based mutant selection yields 12% higher fault detection than the considered tools.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AST58925.2023.00008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recently many mutation testing tools have been proposed that rely on bug-fix patterns and natural language models trained on large code corpus. As these tools operate fundamentally differently from the grammar-based traditional approaches, a question arises of how these tools compare in terms of 1) fault detection and 2) cost-effectiveness. Simultaneously, mutation testing research proposes mutant selection approaches based on machine learning to mitigate its application cost. This raises another question: How do the existing mutation testing tools compare when guided by mutant selection approaches? To answer these questions, we compare four existing tools – μBERT (uses pre-trained language model for fault seeding), IBIR (relies on inverted fix-patterns), DeepMutation (generates mutants by employing Neural Machine Translation) and PIT (applies standard grammar-based rules) in terms of fault detection capability and cost-effectiveness, in conjunction with standard and deep learning based mutant selection strategies. Our results show that IBIR has the highest fault detection capability among the four tools; however, it is not the most cost-effective when considering different selection strategies. On the other hand, μBERT having a relatively lower fault detection capability, is the most cost-effective among the four tools. Our results also indicate that comparing mutation testing tools when using deep learning-based mutant selection strategies can lead to different conclusions than the standard mutant selection. For instance, our results demonstrate that combining μBERT with deep learning-based mutant selection yields 12% higher fault detection than the considered tools.

查看原文本刊更多论文

基于学习的突变选择的突变检测工具比较

最近提出了许多依赖于bug修复模式和在大型代码语料库上训练的自然语言模型的突变测试工具。由于这些工具的操作与基于语法的传统方法有本质上的不同，因此出现了一个问题，即这些工具如何在1)故障检测和2)成本效益方面进行比较。同时，突变检测研究提出了基于机器学习的突变选择方法，以降低其应用成本。这就提出了另一个问题:在突变选择方法的指导下，现有的突变检测工具如何进行比较?为了回答这些问题，我们比较了四种现有的工具- μBERT(使用预训练的语言模型进行故障播种)，IBIR(依赖于倒立固定模式)，DeepMutation(通过使用神经机器翻译生成突变体)和PIT(应用基于语法的标准规则)在故障检测能力和成本效益方面，以及基于标准和深度学习的突变体选择策略。结果表明，IBIR在四种工具中具有最高的故障检测能力;然而，当考虑不同的选择策略时，它并不是最具成本效益的。另一方面，μBERT的故障检测能力相对较低，是四种工具中性价比最高的。我们的研究结果还表明，在使用基于深度学习的突变选择策略时，比较突变测试工具可能会得出与标准突变选择不同的结论。例如，我们的研究结果表明，μBERT与基于深度学习的突变体选择相结合的故障检测率比考虑的工具高12%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE/ACM International Conference on Automation of Software Test (AST)

自引率

0.00%

发文量