可分割配置性能学习

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering Pub Date : 2024-11-05 DOI:10.1109/TSE.2024.3491945

Jingzhi Gong;Tao Chen;Rami Bahsoon

{"title":"可分割配置性能学习","authors":"Jingzhi Gong;Tao Chen;Rami Bahsoon","doi":"10.1109/TSE.2024.3491945","DOIUrl":null,"url":null,"abstract":"Machine/deep learning models have been widely adopted to predict the configuration performance of software systems. However, a crucial yet unaddressed challenge is how to cater for the sparsity inherited from the configuration landscape: the influence of configuration options (features) and the distribution of data samples are highly sparse. In this paper, we propose a model-agnostic and sparsity-robust framework for predicting configuration performance, dubbed \n<monospace>DaL</monospace>\n, based on the new paradigm of dividable learning that builds a model via “divide-and-learn”. To handle sample sparsity, the samples from the configuration landscape are divided into distant divisions, for each of which we build a sparse local model, e.g., regularized Hierarchical Interaction Neural Network, to deal with the feature sparsity. A newly given configuration would then be assigned to the right model of division for the final prediction. Further, \n<monospace>DaL</monospace>\n adaptively determines the optimal number of divisions required for a system and sample size without any extra training or profiling. Experiment results from 12 real-world systems and five sets of training data reveal that, compared with the state-of-the-art approaches, \n<monospace>DaL</monospace>\n performs no worse than the best counterpart on 44 out of 60 cases (within which 31 cases are significantly better) with up to \n<inline-formula><tex-math>$1.61\\times$</tex-math></inline-formula>\n improvement on accuracy; requires fewer samples to reach the same/better accuracy; and producing acceptable training overhead. In particular, the mechanism that adapted the parameter \n<inline-formula><tex-math>$d$</tex-math></inline-formula>\n can reach the optimal value for 76.43% of the individual runs. The result also confirms that the paradigm of dividable learning is more suitable than other similar paradigms such as ensemble learning for predicting configuration performance. Practically, \n<monospace>DaL</monospace>\n considerably improves different global models when using them as the underlying local models, which further strengthens its flexibility. To promote open science, all the data, code, and supplementary materials of this work can be accessed at our repository: \n<uri>https://github.com/ideas-labo/DaL-ext</uri>\n.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"106-134"},"PeriodicalIF":6.5000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10744216","citationCount":"0","resultStr":"{\"title\":\"Dividable Configuration Performance Learning\",\"authors\":\"Jingzhi Gong;Tao Chen;Rami Bahsoon\",\"doi\":\"10.1109/TSE.2024.3491945\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine/deep learning models have been widely adopted to predict the configuration performance of software systems. However, a crucial yet unaddressed challenge is how to cater for the sparsity inherited from the configuration landscape: the influence of configuration options (features) and the distribution of data samples are highly sparse. In this paper, we propose a model-agnostic and sparsity-robust framework for predicting configuration performance, dubbed \\n<monospace>DaL</monospace>\\n, based on the new paradigm of dividable learning that builds a model via “divide-and-learn”. To handle sample sparsity, the samples from the configuration landscape are divided into distant divisions, for each of which we build a sparse local model, e.g., regularized Hierarchical Interaction Neural Network, to deal with the feature sparsity. A newly given configuration would then be assigned to the right model of division for the final prediction. Further, \\n<monospace>DaL</monospace>\\n adaptively determines the optimal number of divisions required for a system and sample size without any extra training or profiling. Experiment results from 12 real-world systems and five sets of training data reveal that, compared with the state-of-the-art approaches, \\n<monospace>DaL</monospace>\\n performs no worse than the best counterpart on 44 out of 60 cases (within which 31 cases are significantly better) with up to \\n<inline-formula><tex-math>$1.61\\\\times$</tex-math></inline-formula>\\n improvement on accuracy; requires fewer samples to reach the same/better accuracy; and producing acceptable training overhead. In particular, the mechanism that adapted the parameter \\n<inline-formula><tex-math>$d$</tex-math></inline-formula>\\n can reach the optimal value for 76.43% of the individual runs. The result also confirms that the paradigm of dividable learning is more suitable than other similar paradigms such as ensemble learning for predicting configuration performance. Practically, \\n<monospace>DaL</monospace>\\n considerably improves different global models when using them as the underlying local models, which further strengthens its flexibility. To promote open science, all the data, code, and supplementary materials of this work can be accessed at our repository: \\n<uri>https://github.com/ideas-labo/DaL-ext</uri>\\n.\",\"PeriodicalId\":13324,\"journal\":{\"name\":\"IEEE Transactions on Software Engineering\",\"volume\":\"51 1\",\"pages\":\"106-134\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2024-11-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10744216\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10744216/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10744216/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

机器/深度学习模型已被广泛用于预测软件系统的配置性能。然而，一个关键但尚未解决的挑战是如何满足从配置环境继承的稀疏性：配置选项（特征）的影响和数据样本的分布是高度稀疏的。在本文中，我们提出了一个模型不可知和稀疏鲁棒的框架，用于预测配置性能，称为DaL，基于可分学习的新范式，该范式通过“分与学习”构建模型。为了处理样本稀疏性，将来自配置景观的样本划分为遥远的分区，我们为每个分区构建一个稀疏的局部模型，例如正则化层次交互神经网络，以处理特征稀疏性。然后将新给出的配置分配给正确的划分模型以进行最终预测。此外，DaL自适应地确定系统所需的最佳划分数量和样本大小，而无需任何额外的训练或分析。来自12个真实世界系统和5组训练数据的实验结果表明，与最先进的方法相比，在60个案例中有44个案例（其中31个案例明显更好），DaL的表现并不比最好的方法差，准确率提高了1.61倍；需要更少的样本来达到相同/更好的精度；产生可接受的训练开销。特别是，适应参数$d$的机制在76.43%的单次运行中可以达到最优值。结果还证实了可分学习范式比集成学习等其他类似范式更适合于预测配置性能。实际上，当使用不同的全局模型作为底层局部模型时，DaL显著地改进了它们，这进一步增强了它的灵活性。为了促进开放科学，本书的所有数据、代码和补充材料都可以在我们的知识库中访问：https://github.com/ideas-labo/DaL-ext。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dividable Configuration Performance Learning

Machine/deep learning models have been widely adopted to predict the configuration performance of software systems. However, a crucial yet unaddressed challenge is how to cater for the sparsity inherited from the configuration landscape: the influence of configuration options (features) and the distribution of data samples are highly sparse. In this paper, we propose a model-agnostic and sparsity-robust framework for predicting configuration performance, dubbed DaL , based on the new paradigm of dividable learning that builds a model via “divide-and-learn”. To handle sample sparsity, the samples from the configuration landscape are divided into distant divisions, for each of which we build a sparse local model, e.g., regularized Hierarchical Interaction Neural Network, to deal with the feature sparsity. A newly given configuration would then be assigned to the right model of division for the final prediction. Further, DaL adaptively determines the optimal number of divisions required for a system and sample size without any extra training or profiling. Experiment results from 12 real-world systems and five sets of training data reveal that, compared with the state-of-the-art approaches, DaL performs no worse than the best counterpart on 44 out of 60 cases (within which 31 cases are significantly better) with up to

$1.61\times$

improvement on accuracy; requires fewer samples to reach the same/better accuracy; and producing acceptable training overhead. In particular, the mechanism that adapted the parameter

$d$

can reach the optimal value for 76.43% of the individual runs. The result also confirms that the paradigm of dividable learning is more suitable than other similar paradigms such as ensemble learning for predicting configuration performance. Practically, DaL considerably improves different global models when using them as the underlying local models, which further strengthens its flexibility. To promote open science, all the data, code, and supplementary materials of this work can be accessed at our repository: https://github.com/ideas-labo/DaL-ext .

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.

可分割配置 性能学习

摘要

可分割配置性能学习