Performance Prediction From Source Code Is Task and Domain Specific

2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC) Pub Date : 2023-05-01 DOI:10.1109/ICPC58990.2023.00015

Markus Böck, Sarra Habchi, Mathieu Nayrolles, Jürgen Cito

{"title":"Performance Prediction From Source Code Is Task and Domain Specific","authors":"Markus Böck, Sarra Habchi, Mathieu Nayrolles, Jürgen Cito","doi":"10.1109/ICPC58990.2023.00015","DOIUrl":null,"url":null,"abstract":"Performance is key to the success and adoption of software systems. In video games, performance is commonly highlighted as one of the top quality concerns raised by players. To check the performance of their systems, development teams tend to rely on profiling and monitoring tools, which observe program executions to identify regressions. The usage of static analysis tools for this purpose has been so far limited. Lately, the success of Large Language Models in many code analytics tools led to attempts to leverage them in static performance analysis. These studies showed promising results in predicting runtime and regressions on large public datasets. In this paper, we evaluate the usability of such models in practice, and particularly in the domain of video games. We train a state-of-the-art neural network on the Code4Bench dataset to predict runtime regressions for programming competition programs, then evaluate its ability to generalize to new domains. Our results show that these models achieve great results (e.g. 95.73% accuracy for performance comparison) on the original domain for programs solving in-sample programming tasks, yet fail to generalize to out-of-sample tasks. Furthermore, we show that transfer techniques such as domain adversarial adaptation and model fine-tuning are not sufficient to transfer these models to the target industrial domain of AAA games.","PeriodicalId":376593,"journal":{"name":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPC58990.2023.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Performance is key to the success and adoption of software systems. In video games, performance is commonly highlighted as one of the top quality concerns raised by players. To check the performance of their systems, development teams tend to rely on profiling and monitoring tools, which observe program executions to identify regressions. The usage of static analysis tools for this purpose has been so far limited. Lately, the success of Large Language Models in many code analytics tools led to attempts to leverage them in static performance analysis. These studies showed promising results in predicting runtime and regressions on large public datasets. In this paper, we evaluate the usability of such models in practice, and particularly in the domain of video games. We train a state-of-the-art neural network on the Code4Bench dataset to predict runtime regressions for programming competition programs, then evaluate its ability to generalize to new domains. Our results show that these models achieve great results (e.g. 95.73% accuracy for performance comparison) on the original domain for programs solving in-sample programming tasks, yet fail to generalize to out-of-sample tasks. Furthermore, we show that transfer techniques such as domain adversarial adaptation and model fine-tuning are not sufficient to transfer these models to the target industrial domain of AAA games.

查看原文本刊更多论文

来自源代码的性能预测是特定于任务和领域的

性能是软件系统成功和采用的关键。在电子游戏中，性能通常是玩家最关心的质量问题之一。为了检查他们的系统的性能，开发团队倾向于依赖分析和监视工具，它们观察程序的执行以识别回归。到目前为止，用于此目的的静态分析工具的使用是有限的。最近，大型语言模型在许多代码分析工具中的成功导致了在静态性能分析中利用它们的尝试。这些研究在预测大型公共数据集的运行时间和回归方面显示了有希望的结果。在本文中，我们评估了这些模型在实践中的可用性，特别是在电子游戏领域。我们在Code4Bench数据集上训练了一个最先进的神经网络，以预测编程竞赛程序的运行时回归，然后评估其泛化到新领域的能力。我们的研究结果表明，这些模型在解决样本内编程任务的程序的原始域上取得了很好的结果(例如，性能比较的准确率为95.73%)，但无法推广到样本外任务。此外，我们还表明，领域对抗性适应和模型微调等转移技术不足以将这些模型转移到AAA游戏的目标工业领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)

自引率

0.00%

发文量