嵌入式微架构中高效硬连线微操作转换的新工具集

IF 1.7 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI:10.1109/LES.2024.3447695

Kevin J. Phillipson;Michael G. Rywalt;Baibhab Chatterjee;Eric M. Schwartz;Greg Stitt

{"title":"嵌入式微架构中高效硬连线微操作转换的新工具集","authors":"Kevin J. Phillipson;Michael G. Rywalt;Baibhab Chatterjee;Eric M. Schwartz;Greg Stitt","doi":"10.1109/LES.2024.3447695","DOIUrl":null,"url":null,"abstract":"Modern SoCs require increasingly complex embedded control deep within their numerous sub-blocks without adding significant die area. This motivated the creation of \n<inline-formula> <tex-math>$\\mu $ </tex-math></inline-formula>\nRTL, a novel toolset for systematically designing efficient pipelined implementations of embedded instruction sets originally intended for multicycle execution. \n<inline-formula> <tex-math>$\\mu $ </tex-math></inline-formula>\nRTL utilizes hardwired micro-op translation, a technique commonly used in the instruction decoders of large super-scalar microprocessors, however this technique has been overlooked for designing smaller, more efficient embedded microprocessors. Furthermore, the tools to develop instruction decoders with micro-op translation are proprietary and the techniques are trade secrets. The \n<inline-formula> <tex-math>$\\mu $ </tex-math></inline-formula>\nRTL toolset is open-source and this letter clearly presents the methodology. The methodology emphasizes direct opcode decoding from multiple synthesized Verilog blocks versus traditional microprogramming which uses sequential decoding from a ROM. Our results show that a pipelined \n<inline-formula> <tex-math>$\\mu $ </tex-math></inline-formula>\nRTL microarchitecture achieves a 21.8% reduction in size compared to a hardwired multicycle implementation of the same instruction set. Additionally, the performance of 0.75 DMIPS/MHz surpasses the RISC-V PicoRV32 by 44.2% and the AVR RISC by 82.9%. These improvements in performance, power, and area are of interest to embedded system architects.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"373-376"},"PeriodicalIF":1.7000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Novel Toolset for Efficient Hardwired Micro-Op Translation in Embedded Microarchitectures\",\"authors\":\"Kevin J. Phillipson;Michael G. Rywalt;Baibhab Chatterjee;Eric M. Schwartz;Greg Stitt\",\"doi\":\"10.1109/LES.2024.3447695\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern SoCs require increasingly complex embedded control deep within their numerous sub-blocks without adding significant die area. This motivated the creation of \\n<inline-formula> <tex-math>$\\\\mu $ </tex-math></inline-formula>\\nRTL, a novel toolset for systematically designing efficient pipelined implementations of embedded instruction sets originally intended for multicycle execution. \\n<inline-formula> <tex-math>$\\\\mu $ </tex-math></inline-formula>\\nRTL utilizes hardwired micro-op translation, a technique commonly used in the instruction decoders of large super-scalar microprocessors, however this technique has been overlooked for designing smaller, more efficient embedded microprocessors. Furthermore, the tools to develop instruction decoders with micro-op translation are proprietary and the techniques are trade secrets. The \\n<inline-formula> <tex-math>$\\\\mu $ </tex-math></inline-formula>\\nRTL toolset is open-source and this letter clearly presents the methodology. The methodology emphasizes direct opcode decoding from multiple synthesized Verilog blocks versus traditional microprogramming which uses sequential decoding from a ROM. Our results show that a pipelined \\n<inline-formula> <tex-math>$\\\\mu $ </tex-math></inline-formula>\\nRTL microarchitecture achieves a 21.8% reduction in size compared to a hardwired multicycle implementation of the same instruction set. Additionally, the performance of 0.75 DMIPS/MHz surpasses the RISC-V PicoRV32 by 44.2% and the AVR RISC by 82.9%. These improvements in performance, power, and area are of interest to embedded system architects.\",\"PeriodicalId\":56143,\"journal\":{\"name\":\"IEEE Embedded Systems Letters\",\"volume\":\"16 4\",\"pages\":\"373-376\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Embedded Systems Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10779513/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Embedded Systems Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10779513/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

现代soc需要越来越复杂的嵌入式控制深入其众多的子模块，而不增加显着的模具面积。这激发了$\mu $ RTL的创建，这是一个新颖的工具集，用于系统地设计嵌入式指令集的高效流水线实现，最初用于多周期执行。RTL利用硬连线微操作翻译，这是一种通常用于大型标量微处理器的指令解码器的技术，然而，这种技术在设计更小、更高效的嵌入式微处理器时被忽视了。此外，开发带有微操作翻译的指令解码器的工具是专有的，其技术是商业机密。$\mu $ RTL工具集是开源的，这封信清楚地介绍了方法。该方法强调从多个合成Verilog块中直接解码操作码，而不是使用从ROM中顺序解码的传统微编程。我们的结果表明，与相同指令集的硬连接多周期实现相比，流水线$\mu $ RTL微架构的尺寸减少了21.8%。此外，0.75 DMIPS/MHz的性能比RISC- v PicoRV32高44.2%，比AVR RISC高82.9%。这些性能、功耗和面积方面的改进是嵌入式系统架构师感兴趣的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Novel Toolset for Efficient Hardwired Micro-Op Translation in Embedded Microarchitectures

Modern SoCs require increasingly complex embedded control deep within their numerous sub-blocks without adding significant die area. This motivated the creation of

$\mu $

RTL, a novel toolset for systematically designing efficient pipelined implementations of embedded instruction sets originally intended for multicycle execution.

$\mu $

RTL utilizes hardwired micro-op translation, a technique commonly used in the instruction decoders of large super-scalar microprocessors, however this technique has been overlooked for designing smaller, more efficient embedded microprocessors. Furthermore, the tools to develop instruction decoders with micro-op translation are proprietary and the techniques are trade secrets. The

$\mu $

RTL toolset is open-source and this letter clearly presents the methodology. The methodology emphasizes direct opcode decoding from multiple synthesized Verilog blocks versus traditional microprogramming which uses sequential decoding from a ROM. Our results show that a pipelined

$\mu $

RTL microarchitecture achieves a 21.8% reduction in size compared to a hardwired multicycle implementation of the same instruction set. Additionally, the performance of 0.75 DMIPS/MHz surpasses the RISC-V PicoRV32 by 44.2% and the AVR RISC by 82.9%. These improvements in performance, power, and area are of interest to embedded system architects.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Embedded Systems Letters Engineering-Control and Systems Engineering

CiteScore

3.30

自引率

0.00%

发文量

期刊介绍： The IEEE Embedded Systems Letters (ESL), provides a forum for rapid dissemination of latest technical advances in embedded systems and related areas in embedded software. The emphasis is on models, methods, and tools that ensure secure, correct, efficient and robust design of embedded systems and their applications.