Distance-Based Sampling of Software Configuration Spaces

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) Pub Date : 2019-05-01 DOI:10.1109/ICSE.2019.00112

Christian Kaltenecker, A. Grebhahn, Norbert Siegmund, Jianmei Guo, S. Apel

{"title":"Distance-Based Sampling of Software Configuration Spaces","authors":"Christian Kaltenecker, A. Grebhahn, Norbert Siegmund, Jianmei Guo, S. Apel","doi":"10.1109/ICSE.2019.00112","DOIUrl":null,"url":null,"abstract":"Configurable software systems provide a multitude of configuration options to adjust and optimize their functional and non-functional properties. For instance, to find the fastest configuration for a given setting, a brute-force strategy measures the performance of all configurations, which is typically intractable. Addressing this challenge, state-of-the-art strategies rely on machine learning, analyzing only a few configurations (i.e., a sample set) to predict the performance of other configurations. However, to obtain accurate performance predictions, a representative sample set of configurations is required. Addressing this task, different sampling strategies have been proposed, which come with different advantages (e.g., covering the configuration space systematically) and disadvantages (e.g., the need to enumerate all configurations). In our experiments, we found that most sampling strategies do not achieve a good coverage of the configuration space with respect to covering relevant performance values. That is, they miss important configurations with distinct performance behavior. Based on this observation, we devise a new sampling strategy, called distance-based sampling, that is based on a distance metric and a probability distribution to spread the configurations of the sample set according to a given probability distribution across the configuration space. This way, we cover different kinds of interactions among configuration options in the sample set. To demonstrate the merits of distance-based sampling, we compare it to state-of-the-art sampling strategies, such as t-wise sampling, on $10$ real-world configurable software systems. Our results show that distance-based sampling leads to more accurate performance models for medium to large sample sets.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"22 1","pages":"1084-1094"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"81","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE.2019.00112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 81

Abstract

Configurable software systems provide a multitude of configuration options to adjust and optimize their functional and non-functional properties. For instance, to find the fastest configuration for a given setting, a brute-force strategy measures the performance of all configurations, which is typically intractable. Addressing this challenge, state-of-the-art strategies rely on machine learning, analyzing only a few configurations (i.e., a sample set) to predict the performance of other configurations. However, to obtain accurate performance predictions, a representative sample set of configurations is required. Addressing this task, different sampling strategies have been proposed, which come with different advantages (e.g., covering the configuration space systematically) and disadvantages (e.g., the need to enumerate all configurations). In our experiments, we found that most sampling strategies do not achieve a good coverage of the configuration space with respect to covering relevant performance values. That is, they miss important configurations with distinct performance behavior. Based on this observation, we devise a new sampling strategy, called distance-based sampling, that is based on a distance metric and a probability distribution to spread the configurations of the sample set according to a given probability distribution across the configuration space. This way, we cover different kinds of interactions among configuration options in the sample set. To demonstrate the merits of distance-based sampling, we compare it to state-of-the-art sampling strategies, such as t-wise sampling, on $10$ real-world configurable software systems. Our results show that distance-based sampling leads to more accurate performance models for medium to large sample sets.

查看原文本刊更多论文

基于距离的软件组态空间采样

可配置软件系统提供了大量的配置选项来调整和优化其功能和非功能属性。例如，为了找到给定设置的最快配置，蛮力策略测量所有配置的性能，这通常是难以处理的。为了应对这一挑战，最先进的策略依赖于机器学习，仅分析少数配置(即样本集)来预测其他配置的性能。然而，为了获得准确的性能预测，需要一个具有代表性的配置样本集。针对这一任务，提出了不同的采样策略，这些策略具有不同的优点(例如，系统地覆盖配置空间)和缺点(例如，需要枚举所有配置)。在我们的实验中，我们发现大多数采样策略在覆盖相关性能值方面不能很好地覆盖配置空间。也就是说，它们错过了具有不同性能行为的重要配置。基于这一观察，我们设计了一种新的采样策略，称为基于距离的采样，它基于距离度量和概率分布，根据给定的概率分布在配置空间中扩展样本集的配置。这样，我们就涵盖了示例集中配置选项之间的不同类型的交互。为了证明基于距离的采样的优点，我们将其与最先进的采样策略进行比较，例如在$10$真实世界的可配置软件系统上的t-wise采样。我们的研究结果表明，基于距离的采样可以为中大型样本集提供更准确的性能模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量