E-code: Mastering efficient code generation through pretrained models and expert encoder group

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology Pub Date : 2024-10-20 DOI:10.1016/j.infsof.2024.107602

Yue Pan , Chen Lyu , Zhenyu Yang , Lantian Li , Qi Liu , Xiuting Shao

{"title":"E-code: Mastering efficient code generation through pretrained models and expert encoder group","authors":"Yue Pan , Chen Lyu , Zhenyu Yang , Lantian Li , Qi Liu , Xiuting Shao","doi":"10.1016/j.infsof.2024.107602","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>With the waning of Moore’s Law, the software industry is placing increasing importance on finding alternative solutions for continuous performance enhancement. The significance and research results of software performance optimization have been on the rise in recent years, especially with the advancement propelled by <strong>L</strong>arge <strong>L</strong>anguage <strong>M</strong>odel<strong>s</strong> (LLMs). However, traditional strategies for rectifying performance flaws have shown significant limitations at the competitive code efficiency optimization level, and research on this topic is surprisingly scarce.</div></div><div><h3>Objective:</h3><div>This study aims to address the research gap in this domain, offering practical solutions to the various challenges encountered. Specifically, we have overcome the constraints of traditional performance error rectification strategies and developed a <strong>L</strong>anguage <strong>M</strong>odel (LM) tailored for the competitive code efficiency optimization realm.</div></div><div><h3>Methods:</h3><div>We introduced E-code, an advanced program synthesis LM. Inspired by the recent success of expert LMs, we designed an innovative structure called the Expert Encoder Group. This structure employs multiple expert encoders to extract features tailored for different input types. We assessed the performance of E-code against other leading models on a competitive dataset and conducted in-depth ablation experiments.</div></div><div><h3>Results:</h3><div>Upon systematic evaluation, E-code achieved a 54.98% improvement in code efficiency, significantly outperforming other advanced models. In the ablation experiments, we further validated the significance of the expert encoder group and other components within E-code.</div></div><div><h3>Conclusion:</h3><div>The research findings indicate that the expert encoder group can effectively handle various inputs in efficiency optimization tasks, significantly enhancing the model’s performance. In summary, this study paves new avenues for developing systems and methods to assist programmers in writing efficient code.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"178 ","pages":"Article 107602"},"PeriodicalIF":3.8000,"publicationDate":"2024-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584924002076","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Context:

With the waning of Moore’s Law, the software industry is placing increasing importance on finding alternative solutions for continuous performance enhancement. The significance and research results of software performance optimization have been on the rise in recent years, especially with the advancement propelled by Large Language Models (LLMs). However, traditional strategies for rectifying performance flaws have shown significant limitations at the competitive code efficiency optimization level, and research on this topic is surprisingly scarce.

Objective:

This study aims to address the research gap in this domain, offering practical solutions to the various challenges encountered. Specifically, we have overcome the constraints of traditional performance error rectification strategies and developed a Language Model (LM) tailored for the competitive code efficiency optimization realm.

Methods:

We introduced E-code, an advanced program synthesis LM. Inspired by the recent success of expert LMs, we designed an innovative structure called the Expert Encoder Group. This structure employs multiple expert encoders to extract features tailored for different input types. We assessed the performance of E-code against other leading models on a competitive dataset and conducted in-depth ablation experiments.

Results:

Upon systematic evaluation, E-code achieved a 54.98% improvement in code efficiency, significantly outperforming other advanced models. In the ablation experiments, we further validated the significance of the expert encoder group and other components within E-code.

Conclusion:

The research findings indicate that the expert encoder group can effectively handle various inputs in efficiency optimization tasks, significantly enhancing the model’s performance. In summary, this study paves new avenues for developing systems and methods to assist programmers in writing efficient code.

查看原文本刊更多论文

电子编码：通过预训练模型和专家编码器小组掌握高效代码生成

背景：随着摩尔定律的消退，软件行业越来越重视寻找其他解决方案来不断提高性能。近年来，软件性能优化的重要性和研究成果不断增加，尤其是在大型语言模型（LLM）的推动下。然而，传统的性能缺陷修正策略在竞争性代码效率优化层面显示出明显的局限性，而这方面的研究却少得令人吃惊。具体来说，我们克服了传统性能纠错策略的限制，开发了一种专为竞争性代码效率优化领域量身定制的语言模型（LM）。受最近成功的专家 LM 的启发，我们设计了一种名为专家编码器组的创新结构。这种结构采用多个专家编码器，针对不同的输入类型提取特征。结果：经过系统评估，E-code 的代码效率提高了 54.98%，明显优于其他先进模型。在消融实验中，我们进一步验证了专家编码器组和 E-code 中其他组件的重要性。结论：研究结果表明，专家编码器组能有效处理效率优化任务中的各种输入，从而大大提高了模型的性能。总之，这项研究为开发帮助程序员编写高效代码的系统和方法铺平了新的道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information and Software Technology 工程技术-计算机：软件工程

CiteScore

9.10

自引率

7.70%

发文量

164

审稿时长

9.6 weeks

期刊介绍： Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.