Large language model evaluation for high-performance computing software development

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Concurrency and Computation-Practice & Experience Pub Date : 2024-09-04 DOI:10.1002/cpe.8269

William F. Godoy, Pedro Valero-Lara, Keita Teranishi, Prasanna Balaprakash, Jeffrey S. Vetter

{"title":"Large language model evaluation for high-performance computing software development","authors":"William F. Godoy, Pedro Valero-Lara, Keita Teranishi, Prasanna Balaprakash, Jeffrey S. Vetter","doi":"10.1002/cpe.8269","DOIUrl":null,"url":null,"abstract":"We apply AI-assisted large language model (LLM) capabilities of GPT-3 targeting high-performance computing (HPC) kernels for (i) code generation, and (ii) auto-parallelization of serial code in C ++, Fortran, Python and Julia. Our scope includes the following fundamental numerical kernels: AXPY, GEMV, GEMM, SpMV, Jacobi Stencil, and CG, and language/programming models: (1) C++ (e.g., OpenMP [including offload], OpenACC, Kokkos, SyCL, CUDA, and HIP), (2) Fortran (e.g., OpenMP [including offload] and OpenACC), (3) Python (e.g., numpy, Numba, cuPy, and pyCUDA), and (4) Julia (e.g., Threads, CUDA.jl, AMDGPU.jl, and KernelAbstractions.jl). Kernel implementations are generated using GitHub Copilot capabilities powered by the GPT-based OpenAI Codex available in Visual Studio Code given simple <kernel> + <programming model> + <optional hints> prompt variants. To quantify and compare the generated results, we propose a proficiency metric around the initial 10 suggestions given for each prompt. For auto-parallelization, we use ChatGPT interactively giving simple prompts as in a dialogue with another human including simple “prompt engineering” follow ups. Results suggest that correct outputs for C++ correlate with the adoption and maturity of programming models. For example, OpenMP and CUDA score really high, whereas HIP is still lacking. We found that prompts from either a targeted language such as Fortran or the more general-purpose Python can benefit from adding language keywords, while Julia prompts perform acceptably well for its Threads and CUDA.jl programming models. We expect to provide an initial quantifiable point of reference for code generation in each programming model using a state-of-the-art LLM. Overall, understanding the convergence of LLMs, AI, and HPC is crucial due to its rapidly evolving nature and how it is redefining human-computer interactions.","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"36 26","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.8269","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

We apply AI-assisted large language model (LLM) capabilities of GPT-3 targeting high-performance computing (HPC) kernels for (i) code generation, and (ii) auto-parallelization of serial code in C ++, Fortran, Python and Julia. Our scope includes the following fundamental numerical kernels: AXPY, GEMV, GEMM, SpMV, Jacobi Stencil, and CG, and language/programming models: (1) C++ (e.g., OpenMP [including offload], OpenACC, Kokkos, SyCL, CUDA, and HIP), (2) Fortran (e.g., OpenMP [including offload] and OpenACC), (3) Python (e.g., numpy, Numba, cuPy, and pyCUDA), and (4) Julia (e.g., Threads, CUDA.jl, AMDGPU.jl, and KernelAbstractions.jl). Kernel implementations are generated using GitHub Copilot capabilities powered by the GPT-based OpenAI Codex available in Visual Studio Code given simple <kernel> + <programming model> + <optional hints> prompt variants. To quantify and compare the generated results, we propose a proficiency metric around the initial 10 suggestions given for each prompt. For auto-parallelization, we use ChatGPT interactively giving simple prompts as in a dialogue with another human including simple “prompt engineering” follow ups. Results suggest that correct outputs for C++ correlate with the adoption and maturity of programming models. For example, OpenMP and CUDA score really high, whereas HIP is still lacking. We found that prompts from either a targeted language such as Fortran or the more general-purpose Python can benefit from adding language keywords, while Julia prompts perform acceptably well for its Threads and CUDA.jl programming models. We expect to provide an initial quantifiable point of reference for code generation in each programming model using a state-of-the-art LLM. Overall, understanding the convergence of LLMs, AI, and HPC is crucial due to its rapidly evolving nature and how it is redefining human-computer interactions.

查看原文本刊更多论文

高性能计算软件开发的大型语言模型评估

我们应用 GPT-3 的人工智能辅助大型语言模型（LLM）功能，针对高性能计算（HPC）内核进行（i）代码生成，以及（ii）C ++、Fortran、Python 和 Julia 中串行代码的自动并行化。我们的研究范围包括以下基本数值内核：AXPY, GEMV, GEMM, SpMV, Jacobi Stencil, and CG, and language/programming models: (1) C++ (e.g., OpenMP [including offload], OpenACC, Kokkos, SyCL, CUDA, and HIP), (2) Fortran (e.g.、(3) Python（如 numpy、Numba、cuPy 和 pyCUDA），以及 (4) Julia（如 Threads、CUDA.jl、AMDGPU.jl 和 Kernelions.jl）。内核实现是利用 GitHub Copilot 功能生成的，该功能由 Visual Studio Code 中基于 GPT 的 OpenAI Codex 提供，并给出了简单的<内核> + <编程模型> + <可选提示>提示变量。为了量化和比较生成的结果，我们围绕每个提示给出的最初 10 个建议提出了一个熟练度指标。为了实现自动并行化，我们使用 ChatGPT 以交互方式给出简单的提示，就像与另一人对话一样，包括简单的 "提示工程 "跟进。结果表明，C++ 的正确输出与编程模型的采用和成熟度相关。例如，OpenMP 和 CUDA 的得分非常高，而 HIP 仍然不足。我们发现，Fortran 等目标语言或通用性更强的 Python 可从添加语言关键词中获益，而 Julia 提示在 Threads 和 CUDA.jl 编程模型方面的表现尚可接受。我们希望使用最先进的 LLM 为每种编程模型的代码生成提供一个可量化的初始参考点。总之，理解 LLM、人工智能和 HPC 的融合至关重要，因为它具有快速发展的性质，并且正在重新定义人机交互。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Concurrency and Computation-Practice & Experience 工程技术-计算机：理论方法

CiteScore

5.00

自引率

10.00%

发文量

664

审稿时长

9.6 months

期刊介绍： Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.