Dividable Configuration Performance Learning

arXiv - CS - Software Engineering Pub Date : 2024-09-11 DOI:arxiv-2409.07629

Jingzhi Gong, Tao Chen, Rami Bahsoon

{"title":"Dividable Configuration Performance Learning","authors":"Jingzhi Gong, Tao Chen, Rami Bahsoon","doi":"arxiv-2409.07629","DOIUrl":null,"url":null,"abstract":"Machine/deep learning models have been widely adopted for predicting the\nconfiguration performance of software systems. However, a crucial yet\nunaddressed challenge is how to cater for the sparsity inherited from the\nconfiguration landscape: the influence of configuration options (features) and\nthe distribution of data samples are highly sparse. In this paper, we propose a\nmodel-agnostic and sparsity-robust framework for predicting configuration\nperformance, dubbed DaL, based on the new paradigm of dividable learning that\nbuilds a model via \"divide-and-learn\". To handle sample sparsity, the samples\nfrom the configuration landscape are divided into distant divisions, for each\nof which we build a sparse local model, e.g., regularized Hierarchical\nInteraction Neural Network, to deal with the feature sparsity. A newly given\nconfiguration would then be assigned to the right model of division for the\nfinal prediction. Further, DaL adaptively determines the optimal number of\ndivisions required for a system and sample size without any extra training or\nprofiling. Experiment results from 12 real-world systems and five sets of\ntraining data reveal that, compared with the state-of-the-art approaches, DaL\nperforms no worse than the best counterpart on 44 out of 60 cases with up to\n1.61x improvement on accuracy; requires fewer samples to reach the same/better\naccuracy; and producing acceptable training overhead. In particular, the\nmechanism that adapted the parameter d can reach the optimal value for 76.43%\nof the individual runs. The result also confirms that the paradigm of dividable\nlearning is more suitable than other similar paradigms such as ensemble\nlearning for predicting configuration performance. Practically, DaL\nconsiderably improves different global models when using them as the underlying\nlocal models, which further strengthens its flexibility.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07629","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Machine/deep learning models have been widely adopted for predicting the configuration performance of software systems. However, a crucial yet unaddressed challenge is how to cater for the sparsity inherited from the configuration landscape: the influence of configuration options (features) and the distribution of data samples are highly sparse. In this paper, we propose a model-agnostic and sparsity-robust framework for predicting configuration performance, dubbed DaL, based on the new paradigm of dividable learning that builds a model via "divide-and-learn". To handle sample sparsity, the samples from the configuration landscape are divided into distant divisions, for each of which we build a sparse local model, e.g., regularized Hierarchical Interaction Neural Network, to deal with the feature sparsity. A newly given configuration would then be assigned to the right model of division for the final prediction. Further, DaL adaptively determines the optimal number of divisions required for a system and sample size without any extra training or profiling. Experiment results from 12 real-world systems and five sets of training data reveal that, compared with the state-of-the-art approaches, DaL performs no worse than the best counterpart on 44 out of 60 cases with up to 1.61x improvement on accuracy; requires fewer samples to reach the same/better accuracy; and producing acceptable training overhead. In particular, the mechanism that adapted the parameter d can reach the optimal value for 76.43% of the individual runs. The result also confirms that the paradigm of dividable learning is more suitable than other similar paradigms such as ensemble learning for predicting configuration performance. Practically, DaL considerably improves different global models when using them as the underlying local models, which further strengthens its flexibility.

查看原文本刊更多论文

可分割配置性能学习

机器/深度学习模型已被广泛用于预测软件系统的配置性能。然而，一个尚未解决的关键挑战是如何应对配置环境所带来的稀疏性：配置选项（特征）的影响和数据样本的分布高度稀疏。在本文中，我们基于通过 "分而学之 "建立模型的可分学习新范式，提出了一种预测配置性能的模型无关性和稀疏性稳健框架，并将其命名为 DaL。为了处理样本稀疏性问题，我们将配置景观中的样本划分为不同的部分，并为每个部分建立稀疏的局部模型，例如正则化层次交互神经网络，以处理特征稀疏性问题。然后，新给出的配置将被分配给合适的分部模型，用于最终预测。此外，DaL 还能自适应地确定系统所需的最佳分割数和样本大小，而无需任何额外的训练或预测。来自 12 个真实系统和 5 组训练数据的实验结果表明，与最先进的方法相比，DaL 在 60 个案例中的 44 个案例中的表现不比最佳方法差，准确率提高了 1.61 倍；需要更少的样本就能达到相同/更好的准确率；并且产生了可接受的训练开销。特别是，调整参数 d 的机制可以在 76.43% 的单次运行中达到最优值。这一结果也证实了可分式学习范式比其他类似范式（如集合学习）更适合预测配置性能。在实践中，当使用不同的全局模型作为基础局部模型时，DaL 可以显著改进这些模型，这进一步增强了它的灵活性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Software Engineering

自引率

0.00%

发文量