Design and Optimization of Efficient Digital Machine Learning Accelerators: An overview of architecture choices, efficient quantization, sparsity exploration, and system integration techniques
{"title":"Design and Optimization of Efficient Digital Machine Learning Accelerators: An overview of architecture choices, efficient quantization, sparsity exploration, and system integration techniques","authors":"Wei Tang;Sung-Gun Cho;Jie-Fang Zhang;Zhengya Zhang","doi":"10.1109/MSSC.2025.3549361","DOIUrl":null,"url":null,"abstract":"Digital machine learning (ML) accelerators are popular and widely used. We provide an overview of the SIMD and systolic array architectures that form the foundation of many accelerator designs. The demand for higher compute density, energy efficiency, and scalability has been increasing. To address these needs, new ML accelerator designs have adopted a range of techniques, including advanced architectural design, more efficient quantization, exploiting data-level sparsity, and leveraging new integration technologies. For each of these techniques, we review the common approaches, identify the design tradeoffs, and discuss their implications.","PeriodicalId":100636,"journal":{"name":"IEEE Solid-State Circuits Magazine","volume":"17 2","pages":"30-38"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Solid-State Circuits Magazine","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11044989/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Digital machine learning (ML) accelerators are popular and widely used. We provide an overview of the SIMD and systolic array architectures that form the foundation of many accelerator designs. The demand for higher compute density, energy efficiency, and scalability has been increasing. To address these needs, new ML accelerator designs have adopted a range of techniques, including advanced architectural design, more efficient quantization, exploiting data-level sparsity, and leveraging new integration technologies. For each of these techniques, we review the common approaches, identify the design tradeoffs, and discuss their implications.