Evaluating Search-Based Software Microbenchmark Prioritization

IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Christoph Laaber;Tao Yue;Shaukat Ali
{"title":"Evaluating Search-Based Software Microbenchmark Prioritization","authors":"Christoph Laaber;Tao Yue;Shaukat Ali","doi":"10.1109/TSE.2024.3380836","DOIUrl":null,"url":null,"abstract":"Ensuring that software performance does not degrade after a code change is paramount. A solution is to regularly execute software microbenchmarks, a performance testing technique similar to (functional) unit tests, which, however, often becomes infeasible due to extensive runtimes. To address that challenge, research has investigated regression testing techniques, such as test case prioritization (TCP), which reorder the execution within a microbenchmark suite to detect larger performance changes sooner. Such techniques are either designed for unit tests and perform sub-par on microbenchmarks or require complex performance models, drastically reducing their potential application. In this paper, we empirically evaluate single- and multi-objective search-based microbenchmark prioritization techniques to understand whether they are more effective and efficient than greedy, coverage-based techniques. For this, we devise three search objectives, i.e., coverage to maximize, coverage overlap to minimize, and historical performance change detection to maximize. We find that search algorithms (SAs) are only competitive with but do not outperform the best greedy, coverage-based baselines. However, a simple greedy technique utilizing solely the performance change history (without coverage information) is equally or more effective than the best coverage-based techniques while being considerably more efficient, with a runtime overhead of less than \n<inline-formula><tex-math>$1$</tex-math></inline-formula>\n%. These results show that simple, non-coverage-based techniques are a better fit for microbenchmarks than complex coverage-based techniques.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":null,"pages":null},"PeriodicalIF":6.5000,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10478254/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Ensuring that software performance does not degrade after a code change is paramount. A solution is to regularly execute software microbenchmarks, a performance testing technique similar to (functional) unit tests, which, however, often becomes infeasible due to extensive runtimes. To address that challenge, research has investigated regression testing techniques, such as test case prioritization (TCP), which reorder the execution within a microbenchmark suite to detect larger performance changes sooner. Such techniques are either designed for unit tests and perform sub-par on microbenchmarks or require complex performance models, drastically reducing their potential application. In this paper, we empirically evaluate single- and multi-objective search-based microbenchmark prioritization techniques to understand whether they are more effective and efficient than greedy, coverage-based techniques. For this, we devise three search objectives, i.e., coverage to maximize, coverage overlap to minimize, and historical performance change detection to maximize. We find that search algorithms (SAs) are only competitive with but do not outperform the best greedy, coverage-based baselines. However, a simple greedy technique utilizing solely the performance change history (without coverage information) is equally or more effective than the best coverage-based techniques while being considerably more efficient, with a runtime overhead of less than $1$ %. These results show that simple, non-coverage-based techniques are a better fit for microbenchmarks than complex coverage-based techniques.
评估基于搜索的软件微基准优先级排序
确保软件性能在代码更改后不会降低是至关重要的。一种解决方案是定期执行软件微基准,这是一种类似于(功能)单元测试的性能测试技术,但由于运行时间过长,这种方法往往不可行。为了应对这一挑战,研究人员对回归测试技术进行了研究,如测试用例优先级排序(TCP),它可以重新安排微基准测试套件的执行顺序,以便更快地检测到较大的性能变化。这些技术要么是为单元测试设计的,在微基准测试中表现不佳,要么需要复杂的性能模型,从而大大降低了其潜在应用价值。在本文中,我们对基于搜索的单目标和多目标微基准优先级排序技术进行了实证评估,以了解这些技术是否比基于覆盖范围的贪婪技术更有效、更高效。为此,我们设计了三个搜索目标,即覆盖最大化、覆盖重叠最小化和历史性能变化检测最大化。我们发现,搜索算法(SA)与基于覆盖率的最佳贪婪基线相比,只有竞争力而没有优越性。然而,仅利用性能变化历史记录(无覆盖信息)的简单贪婪技术与基于覆盖率的最佳技术相比同样有效,甚至更有效,而且效率更高,运行时开销不到 1 美元%。这些结果表明,与复杂的基于覆盖率的技术相比,简单的、不基于覆盖率的技术更适合微基准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering 工程技术-工程:电子与电气
CiteScore
9.70
自引率
10.80%
发文量
724
审稿时长
6 months
期刊介绍: IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信