{"title":"Efficient Design Space Exploration for the BOOM Using SAC-Based Reinforcement Learning","authors":"Mingjun Cheng;Shihan Zhang;Xin Zheng;Xian Lin;Huaien Gao;Shuting Cai;Xiaoming Xiong;Bei Yu","doi":"10.1109/TVLSI.2025.3572799","DOIUrl":null,"url":null,"abstract":"Design space exploration (DSE) is crucial for optimizing the performance, power, and area (PPA) of CPU microarchitectures (<inline-formula> <tex-math>$\\mu $ </tex-math></inline-formula>-archs). While various machine learning (ML) algorithms have been applied to the <inline-formula> <tex-math>$\\mu $ </tex-math></inline-formula>-arch DSE problem, the potential of reinforcement learning (RL) remains underexplored. In this article, we propose a novel RL-based approach to address the reduced instruction set computer V (RISC-V) CPU <inline-formula> <tex-math>$\\mu $ </tex-math></inline-formula>-arch DSE problem. This approach enables dynamic selection and optimization of <inline-formula> <tex-math>$\\mu $ </tex-math></inline-formula>-arch parameters without relying on predefined modification sequences, thus significantly enhancing exploration flexibility. To address the challenges posed by high-dimensional action spaces and sparse rewards, we use a discrete soft actor-critic (SAC) framework with entropy maximization to promote efficient exploration. In addition, we integrate multistep temporal-difference (TD) learning, an experience replay (ER) buffer, and return normalization to improve sample efficiency and learning stability during training. Our method further aligns optimization with user-defined preferences by normalizing PPA metrics relative to baseline designs. Experimental results on the Berkeley out-of-order machine (BOOM) demonstrate that the proposed approach achieves superior performance compared with state-of-the-art methods, showcasing its effectiveness and efficiency for <inline-formula> <tex-math>$\\mu $ </tex-math></inline-formula>-arch DSE. Our code is available at <uri>https://github.com/exhaust-create/SAC-DSE</uri>.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2252-2263"},"PeriodicalIF":3.1000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11038948/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Design space exploration (DSE) is crucial for optimizing the performance, power, and area (PPA) of CPU microarchitectures ($\mu $ -archs). While various machine learning (ML) algorithms have been applied to the $\mu $ -arch DSE problem, the potential of reinforcement learning (RL) remains underexplored. In this article, we propose a novel RL-based approach to address the reduced instruction set computer V (RISC-V) CPU $\mu $ -arch DSE problem. This approach enables dynamic selection and optimization of $\mu $ -arch parameters without relying on predefined modification sequences, thus significantly enhancing exploration flexibility. To address the challenges posed by high-dimensional action spaces and sparse rewards, we use a discrete soft actor-critic (SAC) framework with entropy maximization to promote efficient exploration. In addition, we integrate multistep temporal-difference (TD) learning, an experience replay (ER) buffer, and return normalization to improve sample efficiency and learning stability during training. Our method further aligns optimization with user-defined preferences by normalizing PPA metrics relative to baseline designs. Experimental results on the Berkeley out-of-order machine (BOOM) demonstrate that the proposed approach achieves superior performance compared with state-of-the-art methods, showcasing its effectiveness and efficiency for $\mu $ -arch DSE. Our code is available at https://github.com/exhaust-create/SAC-DSE.
期刊介绍:
The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.