Kevin J. Phillipson;Michael G. Rywalt;Baibhab Chatterjee;Eric M. Schwartz;Greg Stitt
{"title":"嵌入式微架构中高效硬连线微操作转换的新工具集","authors":"Kevin J. Phillipson;Michael G. Rywalt;Baibhab Chatterjee;Eric M. Schwartz;Greg Stitt","doi":"10.1109/LES.2024.3447695","DOIUrl":null,"url":null,"abstract":"Modern SoCs require increasingly complex embedded control deep within their numerous sub-blocks without adding significant die area. This motivated the creation of \n<inline-formula> <tex-math>$\\mu $ </tex-math></inline-formula>\nRTL, a novel toolset for systematically designing efficient pipelined implementations of embedded instruction sets originally intended for multicycle execution. \n<inline-formula> <tex-math>$\\mu $ </tex-math></inline-formula>\nRTL utilizes hardwired micro-op translation, a technique commonly used in the instruction decoders of large super-scalar microprocessors, however this technique has been overlooked for designing smaller, more efficient embedded microprocessors. Furthermore, the tools to develop instruction decoders with micro-op translation are proprietary and the techniques are trade secrets. The \n<inline-formula> <tex-math>$\\mu $ </tex-math></inline-formula>\nRTL toolset is open-source and this letter clearly presents the methodology. The methodology emphasizes direct opcode decoding from multiple synthesized Verilog blocks versus traditional microprogramming which uses sequential decoding from a ROM. Our results show that a pipelined \n<inline-formula> <tex-math>$\\mu $ </tex-math></inline-formula>\nRTL microarchitecture achieves a 21.8% reduction in size compared to a hardwired multicycle implementation of the same instruction set. Additionally, the performance of 0.75 DMIPS/MHz surpasses the RISC-V PicoRV32 by 44.2% and the AVR RISC by 82.9%. These improvements in performance, power, and area are of interest to embedded system architects.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"373-376"},"PeriodicalIF":1.7000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Novel Toolset for Efficient Hardwired Micro-Op Translation in Embedded Microarchitectures\",\"authors\":\"Kevin J. Phillipson;Michael G. Rywalt;Baibhab Chatterjee;Eric M. Schwartz;Greg Stitt\",\"doi\":\"10.1109/LES.2024.3447695\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern SoCs require increasingly complex embedded control deep within their numerous sub-blocks without adding significant die area. This motivated the creation of \\n<inline-formula> <tex-math>$\\\\mu $ </tex-math></inline-formula>\\nRTL, a novel toolset for systematically designing efficient pipelined implementations of embedded instruction sets originally intended for multicycle execution. \\n<inline-formula> <tex-math>$\\\\mu $ </tex-math></inline-formula>\\nRTL utilizes hardwired micro-op translation, a technique commonly used in the instruction decoders of large super-scalar microprocessors, however this technique has been overlooked for designing smaller, more efficient embedded microprocessors. Furthermore, the tools to develop instruction decoders with micro-op translation are proprietary and the techniques are trade secrets. The \\n<inline-formula> <tex-math>$\\\\mu $ </tex-math></inline-formula>\\nRTL toolset is open-source and this letter clearly presents the methodology. The methodology emphasizes direct opcode decoding from multiple synthesized Verilog blocks versus traditional microprogramming which uses sequential decoding from a ROM. Our results show that a pipelined \\n<inline-formula> <tex-math>$\\\\mu $ </tex-math></inline-formula>\\nRTL microarchitecture achieves a 21.8% reduction in size compared to a hardwired multicycle implementation of the same instruction set. Additionally, the performance of 0.75 DMIPS/MHz surpasses the RISC-V PicoRV32 by 44.2% and the AVR RISC by 82.9%. These improvements in performance, power, and area are of interest to embedded system architects.\",\"PeriodicalId\":56143,\"journal\":{\"name\":\"IEEE Embedded Systems Letters\",\"volume\":\"16 4\",\"pages\":\"373-376\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Embedded Systems Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10779513/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Embedded Systems Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10779513/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Novel Toolset for Efficient Hardwired Micro-Op Translation in Embedded Microarchitectures
Modern SoCs require increasingly complex embedded control deep within their numerous sub-blocks without adding significant die area. This motivated the creation of
$\mu $
RTL, a novel toolset for systematically designing efficient pipelined implementations of embedded instruction sets originally intended for multicycle execution.
$\mu $
RTL utilizes hardwired micro-op translation, a technique commonly used in the instruction decoders of large super-scalar microprocessors, however this technique has been overlooked for designing smaller, more efficient embedded microprocessors. Furthermore, the tools to develop instruction decoders with micro-op translation are proprietary and the techniques are trade secrets. The
$\mu $
RTL toolset is open-source and this letter clearly presents the methodology. The methodology emphasizes direct opcode decoding from multiple synthesized Verilog blocks versus traditional microprogramming which uses sequential decoding from a ROM. Our results show that a pipelined
$\mu $
RTL microarchitecture achieves a 21.8% reduction in size compared to a hardwired multicycle implementation of the same instruction set. Additionally, the performance of 0.75 DMIPS/MHz surpasses the RISC-V PicoRV32 by 44.2% and the AVR RISC by 82.9%. These improvements in performance, power, and area are of interest to embedded system architects.
期刊介绍:
The IEEE Embedded Systems Letters (ESL), provides a forum for rapid dissemination of latest technical advances in embedded systems and related areas in embedded software. The emphasis is on models, methods, and tools that ensure secure, correct, efficient and robust design of embedded systems and their applications.