{"title":"利用 ACEOB 衡量代码效率优化能力","authors":"Yue Pan, Xiuting Shao, Chen Lyu","doi":"10.1016/j.jss.2024.112250","DOIUrl":null,"url":null,"abstract":"<div><div>As Moore’s Law gains diminish, software performance and efficiency become increasingly vital. Optimizing code efficiency is challenging, even for professional programmers. However, related research remains relatively scarce, and rigorously assessing models’ abilities to optimize code efficiency is fraught with difficulties. In response to this challenge, we first conduct an in-depth analysis of “code patterns” in the model training dataset, meticulously exploring human-written code. Secondly, we define a task for optimizing code efficiency and introduce the <strong>A</strong>utomatic <strong>C</strong>ode <strong>E</strong>fficiency <strong>O</strong>ptimization <strong>B</strong>enchmark (ACEOB), which consists of 95,359 pairs of efficient–inefficient code aimed at assessing code efficiency optimization capabilities. To our knowledge, ACEOB is the first dataset specifically targeting Python code efficiency optimization. To evaluate models’ ability in optimizing code efficiency, we propose two new metrics: the <strong>I</strong>somorphic <strong>O</strong>ptimal <strong>C</strong>omparison <strong>C</strong>ode<strong>B</strong>LEU (IOCCB) metric and the <strong>N</strong>ormalized <strong>P</strong>erformance <strong>I</strong>ndex (NPI) metric, to assess the efficiency of model-generated code. We also evaluate several advanced code models, such as PolyCoder and CodeT5, after fine-tuning them on ACEOB and demonstrate that the efficiency of each model improves after introducing the NPI filter. However, it was observed that even ChatGPT does not perform optimally in code efficiency optimization tasks. Our dataset and models are available at: <span><span>https://github.com/CodeGeneration2/ACEOB</span><svg><path></path></svg></span>.</div><div><em>Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board</em>.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"219 ","pages":"Article 112250"},"PeriodicalIF":3.7000,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Measuring code efficiency optimization capabilities with ACEOB\",\"authors\":\"Yue Pan, Xiuting Shao, Chen Lyu\",\"doi\":\"10.1016/j.jss.2024.112250\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>As Moore’s Law gains diminish, software performance and efficiency become increasingly vital. Optimizing code efficiency is challenging, even for professional programmers. However, related research remains relatively scarce, and rigorously assessing models’ abilities to optimize code efficiency is fraught with difficulties. In response to this challenge, we first conduct an in-depth analysis of “code patterns” in the model training dataset, meticulously exploring human-written code. Secondly, we define a task for optimizing code efficiency and introduce the <strong>A</strong>utomatic <strong>C</strong>ode <strong>E</strong>fficiency <strong>O</strong>ptimization <strong>B</strong>enchmark (ACEOB), which consists of 95,359 pairs of efficient–inefficient code aimed at assessing code efficiency optimization capabilities. To our knowledge, ACEOB is the first dataset specifically targeting Python code efficiency optimization. To evaluate models’ ability in optimizing code efficiency, we propose two new metrics: the <strong>I</strong>somorphic <strong>O</strong>ptimal <strong>C</strong>omparison <strong>C</strong>ode<strong>B</strong>LEU (IOCCB) metric and the <strong>N</strong>ormalized <strong>P</strong>erformance <strong>I</strong>ndex (NPI) metric, to assess the efficiency of model-generated code. We also evaluate several advanced code models, such as PolyCoder and CodeT5, after fine-tuning them on ACEOB and demonstrate that the efficiency of each model improves after introducing the NPI filter. However, it was observed that even ChatGPT does not perform optimally in code efficiency optimization tasks. Our dataset and models are available at: <span><span>https://github.com/CodeGeneration2/ACEOB</span><svg><path></path></svg></span>.</div><div><em>Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board</em>.</div></div>\",\"PeriodicalId\":51099,\"journal\":{\"name\":\"Journal of Systems and Software\",\"volume\":\"219 \",\"pages\":\"Article 112250\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems and Software\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0164121224002942\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121224002942","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
Measuring code efficiency optimization capabilities with ACEOB
As Moore’s Law gains diminish, software performance and efficiency become increasingly vital. Optimizing code efficiency is challenging, even for professional programmers. However, related research remains relatively scarce, and rigorously assessing models’ abilities to optimize code efficiency is fraught with difficulties. In response to this challenge, we first conduct an in-depth analysis of “code patterns” in the model training dataset, meticulously exploring human-written code. Secondly, we define a task for optimizing code efficiency and introduce the Automatic Code Efficiency Optimization Benchmark (ACEOB), which consists of 95,359 pairs of efficient–inefficient code aimed at assessing code efficiency optimization capabilities. To our knowledge, ACEOB is the first dataset specifically targeting Python code efficiency optimization. To evaluate models’ ability in optimizing code efficiency, we propose two new metrics: the Isomorphic Optimal Comparison CodeBLEU (IOCCB) metric and the Normalized Performance Index (NPI) metric, to assess the efficiency of model-generated code. We also evaluate several advanced code models, such as PolyCoder and CodeT5, after fine-tuning them on ACEOB and demonstrate that the efficiency of each model improves after introducing the NPI filter. However, it was observed that even ChatGPT does not perform optimally in code efficiency optimization tasks. Our dataset and models are available at: https://github.com/CodeGeneration2/ACEOB.
Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board.
期刊介绍:
The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to:
•Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution
•Agile, model-driven, service-oriented, open source and global software development
•Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems
•Human factors and management concerns of software development
•Data management and big data issues of software systems
•Metrics and evaluation, data mining of software development resources
•Business and economic aspects of software development processes
The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.