{"title":"AIA: A Customized Multi-Core RISC-V SoC for Discrete Sampling Workloads in 16 nm","authors":"Shirui Zhao;Nimish Shah;Wannes Meert;Marian Verhelst","doi":"10.1109/JSSC.2025.3561880","DOIUrl":null,"url":null,"abstract":"Probabilistic models (PMs) are essential in advancing machine learning capabilities, particularly in safety-critical applications involving reasoning and decision-making. Among the methods employed for inference in these models, sampling-based Markov chain Monte Carlo (MCMC) techniques are widely used. However, MCMC methods come with significant computational costs and are inherently challenging to parallelize, resulting in inefficient execution on conventional CPU/GPU platforms. To overcome these challenges, this article presents an approximate inference accelerator (AIA), a multi-core RISC-V system-on-chip (SoC) design fabricated using Intel’s 16 nm process technology. Our AIA is specifically designed to empower edge devices with robust decision-making and reasoning abilities. The AIA architecture incorporates an RISC-V host processor to manage chip-to-chip data communication and a 2-D mesh of 16 custom versatile RISC-V cores optimized for high-efficiency approximate inference. Each core features: 1) custom instructions and datapath blocks for non-normalized Knuth-Yao (KY) sampling, as well as for the interpolation of non-linear functions (e.g., logarithmic and exponential), and 2) direct data-access to the register file (RF) of each neighboring core, to reduce the data movement costs of frequent data exchanges between nearby cores. To further capitalize on the parallelism potential in MCMC algorithms, we developed a specialized compile chain that enables efficient spatial mapping and scheduling across the cores. As a result, AIA attains a peak sampling rate of 1277 MSamples/s at 0.9 V and achieves an energy efficiency of 20 GSamples/s/W at 0.7 V, surpassing the previous state-of-the-art (SotA) ASIC accelerator for probabilistic inference by up to <inline-formula> <tex-math>$6\\times $ </tex-math></inline-formula> in speed and <inline-formula> <tex-math>$5\\times $ </tex-math></inline-formula> in energy efficiency. Furthermore, the AIA’s versatility is demonstrated through the successful mapping of different types of PM workloads onto the chip.","PeriodicalId":13129,"journal":{"name":"IEEE Journal of Solid-state Circuits","volume":"60 7","pages":"2447-2460"},"PeriodicalIF":4.6000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Solid-state Circuits","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10980265/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Probabilistic models (PMs) are essential in advancing machine learning capabilities, particularly in safety-critical applications involving reasoning and decision-making. Among the methods employed for inference in these models, sampling-based Markov chain Monte Carlo (MCMC) techniques are widely used. However, MCMC methods come with significant computational costs and are inherently challenging to parallelize, resulting in inefficient execution on conventional CPU/GPU platforms. To overcome these challenges, this article presents an approximate inference accelerator (AIA), a multi-core RISC-V system-on-chip (SoC) design fabricated using Intel’s 16 nm process technology. Our AIA is specifically designed to empower edge devices with robust decision-making and reasoning abilities. The AIA architecture incorporates an RISC-V host processor to manage chip-to-chip data communication and a 2-D mesh of 16 custom versatile RISC-V cores optimized for high-efficiency approximate inference. Each core features: 1) custom instructions and datapath blocks for non-normalized Knuth-Yao (KY) sampling, as well as for the interpolation of non-linear functions (e.g., logarithmic and exponential), and 2) direct data-access to the register file (RF) of each neighboring core, to reduce the data movement costs of frequent data exchanges between nearby cores. To further capitalize on the parallelism potential in MCMC algorithms, we developed a specialized compile chain that enables efficient spatial mapping and scheduling across the cores. As a result, AIA attains a peak sampling rate of 1277 MSamples/s at 0.9 V and achieves an energy efficiency of 20 GSamples/s/W at 0.7 V, surpassing the previous state-of-the-art (SotA) ASIC accelerator for probabilistic inference by up to $6\times $ in speed and $5\times $ in energy efficiency. Furthermore, the AIA’s versatility is demonstrated through the successful mapping of different types of PM workloads onto the chip.
期刊介绍:
The IEEE Journal of Solid-State Circuits publishes papers each month in the broad area of solid-state circuits with particular emphasis on transistor-level design of integrated circuits. It also provides coverage of topics such as circuits modeling, technology, systems design, layout, and testing that relate directly to IC design. Integrated circuits and VLSI are of principal interest; material related to discrete circuit design is seldom published. Experimental verification is strongly encouraged.