利用 ACEOB 衡量代码效率优化能力

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software Pub Date : 2024-10-11 DOI:10.1016/j.jss.2024.112250

Yue Pan, Xiuting Shao, Chen Lyu

{"title":"利用 ACEOB 衡量代码效率优化能力","authors":"Yue Pan, Xiuting Shao, Chen Lyu","doi":"10.1016/j.jss.2024.112250","DOIUrl":null,"url":null,"abstract":"<div><div>As Moore’s Law gains diminish, software performance and efficiency become increasingly vital. Optimizing code efficiency is challenging, even for professional programmers. However, related research remains relatively scarce, and rigorously assessing models’ abilities to optimize code efficiency is fraught with difficulties. In response to this challenge, we first conduct an in-depth analysis of “code patterns” in the model training dataset, meticulously exploring human-written code. Secondly, we define a task for optimizing code efficiency and introduce the <strong>A</strong>utomatic <strong>C</strong>ode <strong>E</strong>fficiency <strong>O</strong>ptimization <strong>B</strong>enchmark (ACEOB), which consists of 95,359 pairs of efficient–inefficient code aimed at assessing code efficiency optimization capabilities. To our knowledge, ACEOB is the first dataset specifically targeting Python code efficiency optimization. To evaluate models’ ability in optimizing code efficiency, we propose two new metrics: the <strong>I</strong>somorphic <strong>O</strong>ptimal <strong>C</strong>omparison <strong>C</strong>ode<strong>B</strong>LEU (IOCCB) metric and the <strong>N</strong>ormalized <strong>P</strong>erformance <strong>I</strong>ndex (NPI) metric, to assess the efficiency of model-generated code. We also evaluate several advanced code models, such as PolyCoder and CodeT5, after fine-tuning them on ACEOB and demonstrate that the efficiency of each model improves after introducing the NPI filter. However, it was observed that even ChatGPT does not perform optimally in code efficiency optimization tasks. Our dataset and models are available at: <span><span>https://github.com/CodeGeneration2/ACEOB</span><svg><path></path></svg></span>.</div><div><em>Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board</em>.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"219 ","pages":"Article 112250"},"PeriodicalIF":3.7000,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Measuring code efficiency optimization capabilities with ACEOB\",\"authors\":\"Yue Pan, Xiuting Shao, Chen Lyu\",\"doi\":\"10.1016/j.jss.2024.112250\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>As Moore’s Law gains diminish, software performance and efficiency become increasingly vital. Optimizing code efficiency is challenging, even for professional programmers. However, related research remains relatively scarce, and rigorously assessing models’ abilities to optimize code efficiency is fraught with difficulties. In response to this challenge, we first conduct an in-depth analysis of “code patterns” in the model training dataset, meticulously exploring human-written code. Secondly, we define a task for optimizing code efficiency and introduce the <strong>A</strong>utomatic <strong>C</strong>ode <strong>E</strong>fficiency <strong>O</strong>ptimization <strong>B</strong>enchmark (ACEOB), which consists of 95,359 pairs of efficient–inefficient code aimed at assessing code efficiency optimization capabilities. To our knowledge, ACEOB is the first dataset specifically targeting Python code efficiency optimization. To evaluate models’ ability in optimizing code efficiency, we propose two new metrics: the <strong>I</strong>somorphic <strong>O</strong>ptimal <strong>C</strong>omparison <strong>C</strong>ode<strong>B</strong>LEU (IOCCB) metric and the <strong>N</strong>ormalized <strong>P</strong>erformance <strong>I</strong>ndex (NPI) metric, to assess the efficiency of model-generated code. We also evaluate several advanced code models, such as PolyCoder and CodeT5, after fine-tuning them on ACEOB and demonstrate that the efficiency of each model improves after introducing the NPI filter. However, it was observed that even ChatGPT does not perform optimally in code efficiency optimization tasks. Our dataset and models are available at: <span><span>https://github.com/CodeGeneration2/ACEOB</span><svg><path></path></svg></span>.</div><div><em>Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board</em>.</div></div>\",\"PeriodicalId\":51099,\"journal\":{\"name\":\"Journal of Systems and Software\",\"volume\":\"219 \",\"pages\":\"Article 112250\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems and Software\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0164121224002942\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121224002942","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

随着摩尔定律收益的减少，软件性能和效率变得越来越重要。即使对于专业程序员来说，优化代码效率也是一项挑战。然而，相关研究仍然相对匮乏，严格评估模型优化代码效率的能力也充满困难。为了应对这一挑战，我们首先对模型训练数据集中的 "代码模式 "进行了深入分析，细致地探索了人类编写的代码。其次，我们定义了优化代码效率的任务，并引入了自动代码效率优化基准（ACEOB），该基准由 95359 对高效-低效代码组成，旨在评估代码效率优化能力。据我们所知，ACEOB 是第一个专门针对 Python 代码效率优化的数据集。为了评估模型在优化代码效率方面的能力，我们提出了两个新指标：同构优化比较代码库（IOCCBLEU）指标和归一化性能指数（NPI）指标，用于评估模型生成代码的效率。我们还在 ACEOB 上对 PolyCoder 和 CodeT5 等几个高级代码模型进行微调后对其进行了评估，结果表明在引入 NPI 过滤器后，每个模型的效率都有所提高。不过，据观察，即使是 ChatGPT 在代码效率优化任务中的表现也不尽如人意。我们的数据集和模型可在以下网址获取：https://github.com/CodeGeneration2/ACEOB.Editor 注：开放科学材料已通过《系统与软件期刊》开放科学委员会的验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Measuring code efficiency optimization capabilities with ACEOB

As Moore’s Law gains diminish, software performance and efficiency become increasingly vital. Optimizing code efficiency is challenging, even for professional programmers. However, related research remains relatively scarce, and rigorously assessing models’ abilities to optimize code efficiency is fraught with difficulties. In response to this challenge, we first conduct an in-depth analysis of “code patterns” in the model training dataset, meticulously exploring human-written code. Secondly, we define a task for optimizing code efficiency and introduce the Automatic Code Efficiency Optimization Benchmark (ACEOB), which consists of 95,359 pairs of efficient–inefficient code aimed at assessing code efficiency optimization capabilities. To our knowledge, ACEOB is the first dataset specifically targeting Python code efficiency optimization. To evaluate models’ ability in optimizing code efficiency, we propose two new metrics: the Isomorphic Optimal Comparison CodeBLEU (IOCCB) metric and the Normalized Performance Index (NPI) metric, to assess the efficiency of model-generated code. We also evaluate several advanced code models, such as PolyCoder and CodeT5, after fine-tuning them on ACEOB and demonstrate that the efficiency of each model improves after introducing the NPI filter. However, it was observed that even ChatGPT does not perform optimally in code efficiency optimization tasks. Our dataset and models are available at: https://github.com/CodeGeneration2/ACEOB.

Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Systems and Software 工程技术-计算机：理论方法

CiteScore

8.60

自引率

5.70%

发文量

193

审稿时长

16 weeks

期刊介绍： The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to: •Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution •Agile, model-driven, service-oriented, open source and global software development •Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems •Human factors and management concerns of software development •Data management and big data issues of software systems •Metrics and evaluation, data mining of software development resources •Business and economic aspects of software development processes The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.