Distance-Based Sampling of Software Configuration Spaces

Christian Kaltenecker, A. Grebhahn, Norbert Siegmund, Jianmei Guo, S. Apel
{"title":"Distance-Based Sampling of Software Configuration Spaces","authors":"Christian Kaltenecker, A. Grebhahn, Norbert Siegmund, Jianmei Guo, S. Apel","doi":"10.1109/ICSE.2019.00112","DOIUrl":null,"url":null,"abstract":"Configurable software systems provide a multitude of configuration options to adjust and optimize their functional and non-functional properties. For instance, to find the fastest configuration for a given setting, a brute-force strategy measures the performance of all configurations, which is typically intractable. Addressing this challenge, state-of-the-art strategies rely on machine learning, analyzing only a few configurations (i.e., a sample set) to predict the performance of other configurations. However, to obtain accurate performance predictions, a representative sample set of configurations is required. Addressing this task, different sampling strategies have been proposed, which come with different advantages (e.g., covering the configuration space systematically) and disadvantages (e.g., the need to enumerate all configurations). In our experiments, we found that most sampling strategies do not achieve a good coverage of the configuration space with respect to covering relevant performance values. That is, they miss important configurations with distinct performance behavior. Based on this observation, we devise a new sampling strategy, called distance-based sampling, that is based on a distance metric and a probability distribution to spread the configurations of the sample set according to a given probability distribution across the configuration space. This way, we cover different kinds of interactions among configuration options in the sample set. To demonstrate the merits of distance-based sampling, we compare it to state-of-the-art sampling strategies, such as t-wise sampling, on $10$ real-world configurable software systems. Our results show that distance-based sampling leads to more accurate performance models for medium to large sample sets.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"22 1","pages":"1084-1094"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"81","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE.2019.00112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 81

Abstract

Configurable software systems provide a multitude of configuration options to adjust and optimize their functional and non-functional properties. For instance, to find the fastest configuration for a given setting, a brute-force strategy measures the performance of all configurations, which is typically intractable. Addressing this challenge, state-of-the-art strategies rely on machine learning, analyzing only a few configurations (i.e., a sample set) to predict the performance of other configurations. However, to obtain accurate performance predictions, a representative sample set of configurations is required. Addressing this task, different sampling strategies have been proposed, which come with different advantages (e.g., covering the configuration space systematically) and disadvantages (e.g., the need to enumerate all configurations). In our experiments, we found that most sampling strategies do not achieve a good coverage of the configuration space with respect to covering relevant performance values. That is, they miss important configurations with distinct performance behavior. Based on this observation, we devise a new sampling strategy, called distance-based sampling, that is based on a distance metric and a probability distribution to spread the configurations of the sample set according to a given probability distribution across the configuration space. This way, we cover different kinds of interactions among configuration options in the sample set. To demonstrate the merits of distance-based sampling, we compare it to state-of-the-art sampling strategies, such as t-wise sampling, on $10$ real-world configurable software systems. Our results show that distance-based sampling leads to more accurate performance models for medium to large sample sets.
基于距离的软件组态空间采样
可配置软件系统提供了大量的配置选项来调整和优化其功能和非功能属性。例如,为了找到给定设置的最快配置,蛮力策略测量所有配置的性能,这通常是难以处理的。为了应对这一挑战,最先进的策略依赖于机器学习,仅分析少数配置(即样本集)来预测其他配置的性能。然而,为了获得准确的性能预测,需要一个具有代表性的配置样本集。针对这一任务,提出了不同的采样策略,这些策略具有不同的优点(例如,系统地覆盖配置空间)和缺点(例如,需要枚举所有配置)。在我们的实验中,我们发现大多数采样策略在覆盖相关性能值方面不能很好地覆盖配置空间。也就是说,它们错过了具有不同性能行为的重要配置。基于这一观察,我们设计了一种新的采样策略,称为基于距离的采样,它基于距离度量和概率分布,根据给定的概率分布在配置空间中扩展样本集的配置。这样,我们就涵盖了示例集中配置选项之间的不同类型的交互。为了证明基于距离的采样的优点,我们将其与最先进的采样策略进行比较,例如在$10$真实世界的可配置软件系统上的t-wise采样。我们的研究结果表明,基于距离的采样可以为中大型样本集提供更准确的性能模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信