Structure-Aware Representation Learning for Effective Performance Prediction

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Concurrency and Computation-Practice & Experience Pub Date : 2025-04-08 DOI:10.1002/cpe.70046

Tarek Ramadan, Nathan Pinnow, Chase Phelps, Jayaraman J. Thiagarajan, Tanzima Z. Islam

{"title":"Structure-Aware Representation Learning for Effective Performance Prediction","authors":"Tarek Ramadan, Nathan Pinnow, Chase Phelps, Jayaraman J. Thiagarajan, Tanzima Z. Islam","doi":"10.1002/cpe.70046","DOIUrl":null,"url":null,"abstract":"<p>Application performance is a function of several unknowns stemming from the interactions between the application, runtime, OS, and underlying hardware, making it challenging to model performance using deep learning techniques, especially without a large labeled dataset. Collecting such labeled longitudinal datasets can take weeks. Intuitively, developers could save analysis time during code development by taking a comparative approach between multiple applications. However, the unknown dynamic interactions between applications and execution environments make it difficult for deep learning-based models to predict the performance of new applications. In this paper, we address these problems by presenting a labeled dataset for the community and taking a comparative analysis approach to explore the source code differences between different correct implementations of the same problem. This paper assesses the feasibility of using purely static information, for example, Abstract Syntax Tree (AST), of applications to predict performance change based on code structure. We evaluate several deep learning-based representation learning techniques for source code and propose an architecture for the tree-based Long Short-Term Memory (LSTM) models to discover latent representations for a source code's hierarchical structure. We demonstrate that our proposed architecture enables feed-forward predictive models to predict change in performance using source code with up to 84% accuracy.</p>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 9-11","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cpe.70046","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70046","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Application performance is a function of several unknowns stemming from the interactions between the application, runtime, OS, and underlying hardware, making it challenging to model performance using deep learning techniques, especially without a large labeled dataset. Collecting such labeled longitudinal datasets can take weeks. Intuitively, developers could save analysis time during code development by taking a comparative approach between multiple applications. However, the unknown dynamic interactions between applications and execution environments make it difficult for deep learning-based models to predict the performance of new applications. In this paper, we address these problems by presenting a labeled dataset for the community and taking a comparative analysis approach to explore the source code differences between different correct implementations of the same problem. This paper assesses the feasibility of using purely static information, for example, Abstract Syntax Tree (AST), of applications to predict performance change based on code structure. We evaluate several deep learning-based representation learning techniques for source code and propose an architecture for the tree-based Long Short-Term Memory (LSTM) models to discover latent representations for a source code's hierarchical structure. We demonstrate that our proposed architecture enables feed-forward predictive models to predict change in performance using source code with up to 84% accuracy.

Abstract Image

查看原文本刊更多论文

面向有效绩效预测的结构感知表示学习

应用程序性能是源于应用程序、运行时、操作系统和底层硬件之间的交互的几个未知数的函数，这使得使用深度学习技术建模性能具有挑战性，特别是在没有大型标记数据集的情况下。收集这种有标签的纵向数据集可能需要数周时间。直观地说，开发人员可以通过在多个应用程序之间采用比较方法来节省代码开发期间的分析时间。然而，应用程序和执行环境之间未知的动态交互使得基于深度学习的模型难以预测新应用程序的性能。在本文中，我们通过为社区提供标记数据集并采用比较分析方法来探索相同问题的不同正确实现之间的源代码差异来解决这些问题。本文评估了应用程序使用纯静态信息（例如抽象语法树（AST））来预测基于代码结构的性能变化的可行性。我们评估了几种基于深度学习的源代码表示学习技术，并提出了一种基于树的长短期记忆（LSTM）模型的架构，以发现源代码层次结构的潜在表示。我们证明，我们提出的架构使前馈预测模型能够使用源代码预测性能变化，准确率高达84%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Concurrency and Computation-Practice & Experience 工程技术-计算机：理论方法

CiteScore

5.00

自引率

10.00%

发文量

664

审稿时长

9.6 months

期刊介绍： Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.