Empowering LLMs for Verilog Generation through Multi-Level Summarization

arXiv - CS - Programming Languages Pub Date : 2024-07-15 DOI:arxiv-2407.10424

Yang Zhao, Di Huang, Chongxiao Li, Pengwei Jin, Ziyuan Nan, Tianyun Ma, Lei Qi, Yansong Pan, Zhenxing Zhang, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Xing Hu, Yunji Chen

{"title":"Empowering LLMs for Verilog Generation through Multi-Level Summarization","authors":"Yang Zhao, Di Huang, Chongxiao Li, Pengwei Jin, Ziyuan Nan, Tianyun Ma, Lei Qi, Yansong Pan, Zhenxing Zhang, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Xing Hu, Yunji Chen","doi":"arxiv-2407.10424","DOIUrl":null,"url":null,"abstract":"The increasing complexity and high costs associated with modern processor\ndesign have led to a surge in demand for processor design automation.\nInstruction-tuned large language models (LLMs) have demonstrated remarkable\nperformance in automatically generating code for general-purpose programming\nlanguages like Python. However, these methods fail on hardware description\nlanguages (HDLs) like Verilog due to the scarcity of high-quality instruction\ntuning data, as even advanced LLMs like GPT-3.5 exhibit limited performance on\nVerilog generation. Regarding this issue, we observe that (1) Verilog code\ncollected from the real world has higher quality than those generated by LLMs.\n(2) LLMs like GPT-3.5 excel in summarizing Verilog code rather than generating\nit. Based on these observations, this paper introduces CodeV, a series of\nopen-source instruction-tuned Verilog generation LLMs. Instead of generating\ndescriptions first and then getting the corresponding code from advanced LLMs,\nwe prompt the LLM with Verilog code and let the LLM generate the corresponding\nnatural language description by multi-level summarization. Experimental results\nshow that CodeV relatively surpasses the previous open-source SOTA by 14.4%\n(BetterV in VerilogEval) and 11.3% (RTLCoder in RTLLM) respectively, and also\nrelatively outperforms previous commercial SOTA GPT-4 by 22.1% in VerilogEval.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"39 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.10424","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The increasing complexity and high costs associated with modern processor design have led to a surge in demand for processor design automation. Instruction-tuned large language models (LLMs) have demonstrated remarkable performance in automatically generating code for general-purpose programming languages like Python. However, these methods fail on hardware description languages (HDLs) like Verilog due to the scarcity of high-quality instruction tuning data, as even advanced LLMs like GPT-3.5 exhibit limited performance on Verilog generation. Regarding this issue, we observe that (1) Verilog code collected from the real world has higher quality than those generated by LLMs. (2) LLMs like GPT-3.5 excel in summarizing Verilog code rather than generating it. Based on these observations, this paper introduces CodeV, a series of open-source instruction-tuned Verilog generation LLMs. Instead of generating descriptions first and then getting the corresponding code from advanced LLMs, we prompt the LLM with Verilog code and let the LLM generate the corresponding natural language description by multi-level summarization. Experimental results show that CodeV relatively surpasses the previous open-source SOTA by 14.4% (BetterV in VerilogEval) and 11.3% (RTLCoder in RTLLM) respectively, and also relatively outperforms previous commercial SOTA GPT-4 by 22.1% in VerilogEval.

查看原文本刊更多论文

通过多层次总结为 Verilog 生成赋予 LLM 能力

现代工艺设计的复杂性和高成本不断增加，导致对处理器设计自动化的需求激增。指令调谐大型语言模型（LLM）在为 Python 等通用编程语言自动生成代码方面表现出色。然而，由于高质量指令调谐数据的稀缺，这些方法在 Verilog 等硬件描述语言（HDL）上失效了，因为即使是 GPT-3.5 这样先进的 LLM，在 Verilog 生成上也表现出有限的性能。关于这个问题，我们观察到：(1) 从现实世界中收集的 Verilog 代码比 LLM 生成的代码质量更高；(2) GPT-3.5 等 LLM 擅长总结 Verilog 代码，而不是生成 Verilog 代码。基于这些观察结果，本文介绍了 CodeV，一系列开源指令调整的 Verilog 生成 LLM。我们不是先生成描述，然后再从高级 LLM 获取相应的代码，而是用 Verilog 代码提示 LLM，让 LLM 通过多级总结生成相应的自然语言描述。实验结果表明，CodeV分别以14.4%（VerilogEval中的BetterV）和11.3%（RTLLM中的RTLCoder）的成绩超过了之前的开源SOTA，在VerilogEval中以22.1%的成绩超过了之前的商业SOTA GPT-4。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Programming Languages

自引率

0.00%

发文量