评估预训练的大型语言模型上的零射击提示并行源代码

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software Pub Date : 2025-07-12 DOI:10.1016/j.jss.2025.112543

Devansh Yadav, Shouvick Mondal

{"title":"评估预训练的大型语言模型上的零射击提示并行源代码","authors":"Devansh Yadav, Shouvick Mondal","doi":"10.1016/j.jss.2025.112543","DOIUrl":null,"url":null,"abstract":"<div><div>Large Language Models (LLMs) have become prominent in the software development life cycle, yet the generation of performant source code, particularly through automatic parallelization, remains underexplored. This study compares 23 pre-trained LLMs against the Intel C Compiler (<span>icc</span>), a state-of-the-art auto-parallelization tool, to evaluate their effectiveness in transforming sequential C source code into parallelized versions. Using 30 kernels from the <span>PolyBench C</span> benchmarks, we generated 667 parallelized code versions to assess LLMs’ zero-shot parallelization capabilities. Our experiments reveal that LLMs can outperform <span>icc</span> in non-functional aspects like speedup, with 26.66% of cases surpassing <span>icc</span>’s performance. The best LLM-generated code achieved a <span><math><mrow><mn>7</mn><mo>.</mo><mn>5</mn><mo>×</mo></mrow></math></span> speedup compared to <span>icc</span>’s <span><math><mrow><mn>1</mn><mo>.</mo><mn>08</mn><mo>×</mo></mrow></math></span>. However, only 90 of the 667 generated versions (13.5%) were error-free and functionally correct, underscoring significant reliability challenges. After filtering out versions with compilation errors or data race issues through detailed memory and threading analysis, notable performance gains were observed. Challenges include increased cache miss rates and branch misses with higher thread counts, indicating that simply adding threads does not ensure better performance. Optimizing memory access, managing thread interactions, and validating code correctness are critical for LLM-generated parallel code. Our findings demonstrate that, even without fine-tuning or advanced prompting techniques, pre-trained LLMs can compete with decades-old non-LLM compiler technology in zero-shot sequential-to-parallel code translation. This highlights their potential in automating code parallelization while emphasizing the need to address reliability and performance optimization challenges.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112543"},"PeriodicalIF":4.1000,"publicationDate":"2025-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating pre-trained Large Language Models on zero shot prompts for parallelization of source code\",\"authors\":\"Devansh Yadav, Shouvick Mondal\",\"doi\":\"10.1016/j.jss.2025.112543\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Large Language Models (LLMs) have become prominent in the software development life cycle, yet the generation of performant source code, particularly through automatic parallelization, remains underexplored. This study compares 23 pre-trained LLMs against the Intel C Compiler (<span>icc</span>), a state-of-the-art auto-parallelization tool, to evaluate their effectiveness in transforming sequential C source code into parallelized versions. Using 30 kernels from the <span>PolyBench C</span> benchmarks, we generated 667 parallelized code versions to assess LLMs’ zero-shot parallelization capabilities. Our experiments reveal that LLMs can outperform <span>icc</span> in non-functional aspects like speedup, with 26.66% of cases surpassing <span>icc</span>’s performance. The best LLM-generated code achieved a <span><math><mrow><mn>7</mn><mo>.</mo><mn>5</mn><mo>×</mo></mrow></math></span> speedup compared to <span>icc</span>’s <span><math><mrow><mn>1</mn><mo>.</mo><mn>08</mn><mo>×</mo></mrow></math></span>. However, only 90 of the 667 generated versions (13.5%) were error-free and functionally correct, underscoring significant reliability challenges. After filtering out versions with compilation errors or data race issues through detailed memory and threading analysis, notable performance gains were observed. Challenges include increased cache miss rates and branch misses with higher thread counts, indicating that simply adding threads does not ensure better performance. Optimizing memory access, managing thread interactions, and validating code correctness are critical for LLM-generated parallel code. Our findings demonstrate that, even without fine-tuning or advanced prompting techniques, pre-trained LLMs can compete with decades-old non-LLM compiler technology in zero-shot sequential-to-parallel code translation. This highlights their potential in automating code parallelization while emphasizing the need to address reliability and performance optimization challenges.</div></div>\",\"PeriodicalId\":51099,\"journal\":{\"name\":\"Journal of Systems and Software\",\"volume\":\"230 \",\"pages\":\"Article 112543\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems and Software\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0164121225002122\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121225002122","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

大型语言模型（llm）在软件开发生命周期中已经变得非常突出，但是高性能源代码的生成，特别是通过自动并行化，仍然没有得到充分的探索。本研究将23个预训练的llm与Intel C Compiler (icc)（一种最先进的自动并行化工具）进行比较，以评估它们在将顺序C源代码转换为并行化版本方面的有效性。使用来自PolyBench C基准测试的30个内核，我们生成了667个并行代码版本来评估llm的零射击并行能力。我们的实验表明，llm在加速等非功能方面优于icc，有26.66%的案例超过icc的性能。与icc的1.08倍相比，最好的llm生成的代码实现了7.5倍的加速。然而，在667个生成的版本中，只有90个（13.5%）是无错误和功能正确的，这突出了可靠性方面的重大挑战。在通过详细的内存和线程分析过滤掉存在编译错误或数据争用问题的版本后，可以观察到显著的性能提升。挑战包括更高的缓存丢失率和线程数较高的分支丢失，这表明简单地添加线程并不能确保更好的性能。优化内存访问、管理线程交互和验证代码正确性对于llm生成的并行代码至关重要。我们的研究结果表明，即使没有微调或先进的提示技术，预先训练的llm也可以在零射击顺序到并行代码翻译方面与几十年前的非llm编译器技术竞争。这突出了它们在自动化代码并行化方面的潜力，同时强调了解决可靠性和性能优化挑战的需要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluating pre-trained Large Language Models on zero shot prompts for parallelization of source code

Large Language Models (LLMs) have become prominent in the software development life cycle, yet the generation of performant source code, particularly through automatic parallelization, remains underexplored. This study compares 23 pre-trained LLMs against the Intel C Compiler (icc), a state-of-the-art auto-parallelization tool, to evaluate their effectiveness in transforming sequential C source code into parallelized versions. Using 30 kernels from the PolyBench C benchmarks, we generated 667 parallelized code versions to assess LLMs’ zero-shot parallelization capabilities. Our experiments reveal that LLMs can outperform icc in non-functional aspects like speedup, with 26.66% of cases surpassing icc’s performance. The best LLM-generated code achieved a

7.5 \times

speedup compared to icc’s

1.08 \times

. However, only 90 of the 667 generated versions (13.5%) were error-free and functionally correct, underscoring significant reliability challenges. After filtering out versions with compilation errors or data race issues through detailed memory and threading analysis, notable performance gains were observed. Challenges include increased cache miss rates and branch misses with higher thread counts, indicating that simply adding threads does not ensure better performance. Optimizing memory access, managing thread interactions, and validating code correctness are critical for LLM-generated parallel code. Our findings demonstrate that, even without fine-tuning or advanced prompting techniques, pre-trained LLMs can compete with decades-old non-LLM compiler technology in zero-shot sequential-to-parallel code translation. This highlights their potential in automating code parallelization while emphasizing the need to address reliability and performance optimization challenges.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Systems and Software 工程技术-计算机：理论方法

CiteScore

8.60

自引率

5.70%

发文量

193

审稿时长

16 weeks

期刊介绍： The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to: •Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution •Agile, model-driven, service-oriented, open source and global software development •Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems •Human factors and management concerns of software development •Data management and big data issues of software systems •Metrics and evaluation, data mining of software development resources •Business and economic aspects of software development processes The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.