Evaluating Search-Based Software Microbenchmark Prioritization

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering Pub Date : 2024-03-22 DOI:10.1109/TSE.2024.3380836

Christoph Laaber;Tao Yue;Shaukat Ali

{"title":"Evaluating Search-Based Software Microbenchmark Prioritization","authors":"Christoph Laaber;Tao Yue;Shaukat Ali","doi":"10.1109/TSE.2024.3380836","DOIUrl":null,"url":null,"abstract":"Ensuring that software performance does not degrade after a code change is paramount. A solution is to regularly execute software microbenchmarks, a performance testing technique similar to (functional) unit tests, which, however, often becomes infeasible due to extensive runtimes. To address that challenge, research has investigated regression testing techniques, such as test case prioritization (TCP), which reorder the execution within a microbenchmark suite to detect larger performance changes sooner. Such techniques are either designed for unit tests and perform sub-par on microbenchmarks or require complex performance models, drastically reducing their potential application. In this paper, we empirically evaluate single- and multi-objective search-based microbenchmark prioritization techniques to understand whether they are more effective and efficient than greedy, coverage-based techniques. For this, we devise three search objectives, i.e., coverage to maximize, coverage overlap to minimize, and historical performance change detection to maximize. We find that search algorithms (SAs) are only competitive with but do not outperform the best greedy, coverage-based baselines. However, a simple greedy technique utilizing solely the performance change history (without coverage information) is equally or more effective than the best coverage-based techniques while being considerably more efficient, with a runtime overhead of less than \n<inline-formula><tex-math>$1$</tex-math></inline-formula>\n%. These results show that simple, non-coverage-based techniques are a better fit for microbenchmarks than complex coverage-based techniques.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":null,"pages":null},"PeriodicalIF":6.5000,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10478254/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Ensuring that software performance does not degrade after a code change is paramount. A solution is to regularly execute software microbenchmarks, a performance testing technique similar to (functional) unit tests, which, however, often becomes infeasible due to extensive runtimes. To address that challenge, research has investigated regression testing techniques, such as test case prioritization (TCP), which reorder the execution within a microbenchmark suite to detect larger performance changes sooner. Such techniques are either designed for unit tests and perform sub-par on microbenchmarks or require complex performance models, drastically reducing their potential application. In this paper, we empirically evaluate single- and multi-objective search-based microbenchmark prioritization techniques to understand whether they are more effective and efficient than greedy, coverage-based techniques. For this, we devise three search objectives, i.e., coverage to maximize, coverage overlap to minimize, and historical performance change detection to maximize. We find that search algorithms (SAs) are only competitive with but do not outperform the best greedy, coverage-based baselines. However, a simple greedy technique utilizing solely the performance change history (without coverage information) is equally or more effective than the best coverage-based techniques while being considerably more efficient, with a runtime overhead of less than

$1$

%. These results show that simple, non-coverage-based techniques are a better fit for microbenchmarks than complex coverage-based techniques.

查看原文本刊更多论文

评估基于搜索的软件微基准优先级排序

确保软件性能在代码更改后不会降低是至关重要的。一种解决方案是定期执行软件微基准，这是一种类似于（功能）单元测试的性能测试技术，但由于运行时间过长，这种方法往往不可行。为了应对这一挑战，研究人员对回归测试技术进行了研究，如测试用例优先级排序（TCP），它可以重新安排微基准测试套件的执行顺序，以便更快地检测到较大的性能变化。这些技术要么是为单元测试设计的，在微基准测试中表现不佳，要么需要复杂的性能模型，从而大大降低了其潜在应用价值。在本文中，我们对基于搜索的单目标和多目标微基准优先级排序技术进行了实证评估，以了解这些技术是否比基于覆盖范围的贪婪技术更有效、更高效。为此，我们设计了三个搜索目标，即覆盖最大化、覆盖重叠最小化和历史性能变化检测最大化。我们发现，搜索算法（SA）与基于覆盖率的最佳贪婪基线相比，只有竞争力而没有优越性。然而，仅利用性能变化历史记录（无覆盖信息）的简单贪婪技术与基于覆盖率的最佳技术相比同样有效，甚至更有效，而且效率更高，运行时开销不到 1 美元%。这些结果表明，与复杂的基于覆盖率的技术相比，简单的、不基于覆盖率的技术更适合微基准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.