GEMMV: An LLM-Based Automated Performance-Aware Framework for GEMM Verilog Generation

IF 3.8 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2025-03-09 DOI:10.1109/JETCAS.2025.3568712

Gaoche Zhang;Dingyang Zou;Kairui Sun;Zhihuan Chen;Meiqi Wang;Zhongfeng Wang

{"title":"GEMMV: An LLM-Based Automated Performance-Aware Framework for GEMM Verilog Generation","authors":"Gaoche Zhang;Dingyang Zou;Kairui Sun;Zhihuan Chen;Meiqi Wang;Zhongfeng Wang","doi":"10.1109/JETCAS.2025.3568712","DOIUrl":null,"url":null,"abstract":"Recent advancements in artificial intelligence (AI) models have intensified the need for specialized AI accelerators. The design of optimized general matrix multiplication (GEMM) module tailored for these accelerators is crucial but time-consuming and expertise-demanding, creating a demand for automating design processes. Large language models (LLMs), capable of generating high-quality designs from human instructions, show great promise in automating GEMM module creation. However, the GEMM module’s vast design space and stringent performance requirements, along with the limitations of datasets and the lack of hardware performance awareness of LLMs, have made previous LLM-based register transfer level (RTL) code generation efforts unsuitable for GEMM design. To tackle these challenges, this paper proposes an automated performance-aware LLM-based framework, GEMMV, for generating high-correctness and high-performance Verilog code for GEMM. This framework utilizes in-context learning based on GPT-4 to automatically generate high-quality and well-annotated Verilog code for different variants of the GEMM. Additionally, it leverages in-context learning to obtain performance awareness by integrating a multi-level performance model (MLPM) with fine-tuned LLMs. The Verilog code generated by this framework reduces latency by 3.1x and improves syntax correctness by 65% and functionality correctness by 70% compared to earlier efforts.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"325-336"},"PeriodicalIF":3.8000,"publicationDate":"2025-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10994474/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advancements in artificial intelligence (AI) models have intensified the need for specialized AI accelerators. The design of optimized general matrix multiplication (GEMM) module tailored for these accelerators is crucial but time-consuming and expertise-demanding, creating a demand for automating design processes. Large language models (LLMs), capable of generating high-quality designs from human instructions, show great promise in automating GEMM module creation. However, the GEMM module’s vast design space and stringent performance requirements, along with the limitations of datasets and the lack of hardware performance awareness of LLMs, have made previous LLM-based register transfer level (RTL) code generation efforts unsuitable for GEMM design. To tackle these challenges, this paper proposes an automated performance-aware LLM-based framework, GEMMV, for generating high-correctness and high-performance Verilog code for GEMM. This framework utilizes in-context learning based on GPT-4 to automatically generate high-quality and well-annotated Verilog code for different variants of the GEMM. Additionally, it leverages in-context learning to obtain performance awareness by integrating a multi-level performance model (MLPM) with fine-tuned LLMs. The Verilog code generated by this framework reduces latency by 3.1x and improves syntax correctness by 65% and functionality correctness by 70% compared to earlier efforts.

查看原文本刊更多论文

GEMMV：用于GEMM Verilog生成的基于llm的自动性能感知框架

人工智能（AI）模型的最新进展加剧了对专门的AI加速器的需求。为这些加速器量身定制的优化通用矩阵乘法（GEMM）模块的设计至关重要，但耗时且对专业知识要求很高，因此需要自动化设计过程。大型语言模型（llm）能够根据人类指令生成高质量的设计，在自动化GEMM模块创建方面显示出巨大的前景。然而，GEMM模块巨大的设计空间和严格的性能要求，加上数据集的限制和缺乏对llm硬件性能的认识，使得以前基于llm的寄存器传输级别（RTL）代码生成工作不适合GEMM设计。为了应对这些挑战，本文提出了一个基于性能感知的自动化llm框架GEMMV，用于为GEMM生成高正确性和高性能的Verilog代码。该框架利用基于GPT-4的上下文学习，为GEMM的不同变体自动生成高质量和注释良好的Verilog代码。此外，它利用上下文学习，通过集成多级性能模型（MLPM）和微调llm来获得性能感知。与之前的成果相比，这个框架生成的Verilog代码延迟减少了3.1倍，语法正确性提高了65%，功能正确性提高了70%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Journal on Emerging and Selected Topics in Circuits and Systems ENGINEERING, ELECTRICAL & ELECTRONIC-

CiteScore

8.50

自引率

2.20%

发文量

期刊介绍： The IEEE Journal on Emerging and Selected Topics in Circuits and Systems is published quarterly and solicits, with particular emphasis on emerging areas, special issues on topics that cover the entire scope of the IEEE Circuits and Systems (CAS) Society, namely the theory, analysis, design, tools, and implementation of circuits and systems, spanning their theoretical foundations, applications, and architectures for signal and information processing.