Comparing Mutation Testing at the Levels of Source Code and Compiler Intermediate Representation

2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST) Pub Date : 2019-04-22 DOI:10.1109/ICST.2019.00021

Farah Hariri, A. Shi, V. Fernando, Suleman Mahmood, D. Marinov

{"title":"Comparing Mutation Testing at the Levels of Source Code and Compiler Intermediate Representation","authors":"Farah Hariri, A. Shi, V. Fernando, Suleman Mahmood, D. Marinov","doi":"10.1109/ICST.2019.00021","DOIUrl":null,"url":null,"abstract":"Mutation testing is widely used in research for evaluating the effectiveness of test suites. There are multiple mutation tools that perform mutation at different levels, including traditional mutation testing at the level of source code (SRC) and more recent mutation testing at the level of compiler intermediate representation (IR). This paper presents an extensive comparison of mutation testing at the SRC and IR levels, specifically at the C programming language and the LLVM compiler IR levels. We use a mutation testing tool called SRCIROR that implements conceptually the same mutation operators at both levels. We also employ automated techniques to account for equivalent and duplicated mutants, and to determine minimal and surface mutants. We carry out our study on 15 programs from the Coreutils library. Overall, we find mutation testing to be better at the SRC level: the SRC level produces much fewer mutants and is thus less expensive, but the SRC level still generates a similar number of minimal and surface mutants, and the mutation scores at both levels are very closely correlated. We also perform a case study on the Space program to evaluate which level's mutation score correlates better with the actual fault-detection capability of test suites sampled from Space's test pool. We find the mutation score at both levels to not be very correlated with the actual fault-detection capability of test suites.","PeriodicalId":446827,"journal":{"name":"2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST)","volume":"43 5","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICST.2019.00021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

Mutation testing is widely used in research for evaluating the effectiveness of test suites. There are multiple mutation tools that perform mutation at different levels, including traditional mutation testing at the level of source code (SRC) and more recent mutation testing at the level of compiler intermediate representation (IR). This paper presents an extensive comparison of mutation testing at the SRC and IR levels, specifically at the C programming language and the LLVM compiler IR levels. We use a mutation testing tool called SRCIROR that implements conceptually the same mutation operators at both levels. We also employ automated techniques to account for equivalent and duplicated mutants, and to determine minimal and surface mutants. We carry out our study on 15 programs from the Coreutils library. Overall, we find mutation testing to be better at the SRC level: the SRC level produces much fewer mutants and is thus less expensive, but the SRC level still generates a similar number of minimal and surface mutants, and the mutation scores at both levels are very closely correlated. We also perform a case study on the Space program to evaluate which level's mutation score correlates better with the actual fault-detection capability of test suites sampled from Space's test pool. We find the mutation score at both levels to not be very correlated with the actual fault-detection capability of test suites.

查看原文本刊更多论文

比较源代码和编译器中间表示级别上的突变测试

突变测试被广泛用于评估测试套件的有效性。有多种突变工具可以在不同级别执行突变，包括源代码级别的传统突变测试(SRC)和编译器中间表示级别的最新突变测试(IR)。本文对SRC和IR级别的突变测试进行了广泛的比较，特别是在C编程语言和LLVM编译器IR级别。我们使用名为SRCIROR的突变测试工具，它在两个级别上实现概念上相同的突变操作符。我们还采用自动化技术来解释等效和重复突变，并确定最小和表面突变。我们对coretils库中的15个程序进行了研究。总体而言，我们发现SRC水平的突变检测效果更好:SRC水平产生的突变更少，因此成本更低，但SRC水平仍然产生相似数量的最小和表面突变，并且两个水平的突变得分非常密切相关。我们还对Space程序进行了一个案例研究，以评估哪个级别的突变得分与从Space测试池中采样的测试套件的实际故障检测能力更相关。我们发现两个级别的突变得分与测试套件的实际故障检测能力不太相关。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST)

自引率

0.00%

发文量