Nequivack: Assessing Mutation Score Confidence

2016 IEEE Ninth International Conference on Software Testing, Verification and Validation Workshops (ICSTW) Pub Date : 2016-04-11 DOI:10.1109/ICSTW.2016.29

Dominik Holling, Sebastian Banescu, Marco Probst, A. Petrovska, A. Pretschner

{"title":"Nequivack: Assessing Mutation Score Confidence","authors":"Dominik Holling, Sebastian Banescu, Marco Probst, A. Petrovska, A. Pretschner","doi":"10.1109/ICSTW.2016.29","DOIUrl":null,"url":null,"abstract":"The mutation score is defined as the number of killed mutants divided by the number of non-equivalent mutants. However, whether a mutant is equivalent to the original program is undecidable in general. Thus, even when improving a test suite, a mutant score assessing this test suite may become worse during the development of a system, because of equivalent mutants introduced during mutant creation. This is a fundamental problem. Using static analysis and symbolic execution, we show how to establish non-equivalence or \"don't know\" among mutants. If the number of don't knows is small, this is a good indicator that a computed mutation score actually reflects its above definition. We can therefore have an increased confidence that mutation score trends correspond to actual improvements of a test suite's quality, and are not overly polluted by equivalent mutants. Using a set of 14 representative unit size programs, we show that for some, but not all, of these programs, the above confidence can indeed be established. We also evaluate the reproducibility, efficiency and effectiveness of our Nequivack tool. Our findings are that reproducibility is completely given. A single mutant analysis can be performed within 3 seconds on average, which is efficient for practical and industrial applications.","PeriodicalId":335145,"journal":{"name":"2016 IEEE Ninth International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Ninth International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSTW.2016.29","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

The mutation score is defined as the number of killed mutants divided by the number of non-equivalent mutants. However, whether a mutant is equivalent to the original program is undecidable in general. Thus, even when improving a test suite, a mutant score assessing this test suite may become worse during the development of a system, because of equivalent mutants introduced during mutant creation. This is a fundamental problem. Using static analysis and symbolic execution, we show how to establish non-equivalence or "don't know" among mutants. If the number of don't knows is small, this is a good indicator that a computed mutation score actually reflects its above definition. We can therefore have an increased confidence that mutation score trends correspond to actual improvements of a test suite's quality, and are not overly polluted by equivalent mutants. Using a set of 14 representative unit size programs, we show that for some, but not all, of these programs, the above confidence can indeed be established. We also evaluate the reproducibility, efficiency and effectiveness of our Nequivack tool. Our findings are that reproducibility is completely given. A single mutant analysis can be performed within 3 seconds on average, which is efficient for practical and industrial applications.

查看原文本刊更多论文

Nequivack:评估突变评分置信度

突变得分定义为被杀突变体的数量除以非等效突变体的数量。然而，突变体是否等同于原始程序通常是无法确定的。因此，即使在改进测试套件时，评估该测试套件的突变分数可能在系统开发期间变得更糟，因为在突变创建期间引入了等效的突变。这是一个根本性的问题。使用静态分析和符号执行，我们展示了如何在突变体之间建立不等价或“不知道”。如果不知道的数量很少，这是一个很好的指标，表明计算的突变得分实际上反映了它的上述定义。因此，我们可以增加对突变得分趋势与测试套件质量的实际改进相对应的信心，并且不会被等效的突变过度污染。使用一组14个具有代表性的单位规模方案，我们表明，对于这些方案中的一些，但不是全部，上述置信度确实可以建立。我们还评估了Nequivack工具的重复性、效率和有效性。我们的发现是可重复性是完全给定的。单突变体分析平均可在3秒内完成，这对于实际和工业应用是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE Ninth International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

自引率

0.00%

发文量