{"title":"Opal: A 16-nm Coarse-Grained Reconfigurable Array SoC for Full Sparse Machine Learning Applications","authors":"Po-Han Chen;Bo Wun Cheng;Michael Oduoza;Zhouhua Xie;Rupert Lu;Sai Gautham Ravipati;Kalhan Koul;Alex Carsello;Yuchen Mei;Mark Horowitz;Priyanka Raina","doi":"10.1109/LSSC.2025.3613245","DOIUrl":null,"url":null,"abstract":"Sparsity has recently attracted increased attention in the machine learning (ML) community due to its potential to improve performance and energy efficiency by eliminating ineffectual computations. As ML models evolve rapidly, reconfigurable architectures, such as coarse-grained reconfigurable arrays (CGRAs), are being explored to adapt to and accelerate emerging models. Previous CGRA designs have supported unstructured sparsity and reported promising speedups and energy savings for compute-intensive kernels. However, these approaches still face performance bottlenecks when accelerating entire sparse ML networks. In this letter, we identify the primary sources of inefficiency in prior CGRA-based approaches and present Opal, a CGRA SoC with three key contributions: 1) flexible dataflow architecture supporting Gustavson’s dataflow for sparse matrix multiplication; 2) high-throughput sparse hardware primitives; and 3) enhanced processing elements to support mapping all ML operations on the CGRA. As a result, Opal achieves a 66% to 79% reduction in runtime and energy consumption across our evaluated sparse graph neural network benchmarks compared to prior CGRA solutions which only target kernel acceleration.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"8 ","pages":"293-296"},"PeriodicalIF":2.0000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Solid-State Circuits Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11175337/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Sparsity has recently attracted increased attention in the machine learning (ML) community due to its potential to improve performance and energy efficiency by eliminating ineffectual computations. As ML models evolve rapidly, reconfigurable architectures, such as coarse-grained reconfigurable arrays (CGRAs), are being explored to adapt to and accelerate emerging models. Previous CGRA designs have supported unstructured sparsity and reported promising speedups and energy savings for compute-intensive kernels. However, these approaches still face performance bottlenecks when accelerating entire sparse ML networks. In this letter, we identify the primary sources of inefficiency in prior CGRA-based approaches and present Opal, a CGRA SoC with three key contributions: 1) flexible dataflow architecture supporting Gustavson’s dataflow for sparse matrix multiplication; 2) high-throughput sparse hardware primitives; and 3) enhanced processing elements to support mapping all ML operations on the CGRA. As a result, Opal achieves a 66% to 79% reduction in runtime and energy consumption across our evaluated sparse graph neural network benchmarks compared to prior CGRA solutions which only target kernel acceleration.