A RISC-V Domain-Specific Processor for Deep Learning-Based Channel Estimation

IF 5.2 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems I: Regular Papers Pub Date : 2025-03-26 DOI:10.1109/TCSI.2025.3547319

Meng Guo;Qi Wu;Chuanning Wang;Yangcan Zhou;Shaowei Wang;Chuan Zhang;Zhongfeng Wang;Jun Lin

{"title":"A RISC-V Domain-Specific Processor for Deep Learning-Based Channel Estimation","authors":"Meng Guo;Qi Wu;Chuanning Wang;Yangcan Zhou;Shaowei Wang;Chuan Zhang;Zhongfeng Wang;Jun Lin","doi":"10.1109/TCSI.2025.3547319","DOIUrl":null,"url":null,"abstract":"Channel estimation (CE) is a critical component in the massive multi-input multi-output (MIMO) communication systems. Compared with conventional CE algorithms, deep learning (DL)-based approach becomes a promising alternative, due to its capability of offering enhanced performance and robustness across diverse scenarios. However, efficient DL-based CE algorithms have two key properties that make them challenging for implementation in existing architectures at the edge side: the diversity of deep neural networks (DNNs) and CE strategies, and the involvements of multiple computation-intensive tasks that compass conventional signal processing, artificial intelligence (AI) inference, and online learning. To address these challenges, a domain-specific processor based on an extended RISC-V instruction set architecture (ISA) is proposed to perform these DL-based CE algorithms. First, a dedicated RISC-V ISA extension is developed to support all essential operations required by a DL-based CE algorithm, such as matrix inversion, in a flexible manner. Building on the customized ISA extension, a highly adaptable and scalable RISC-V processor is developed, featuring scalar and vector posit arithmetic units to alleviate high computational and memory demands of DNNs during both inference and training phase. Additionally, a coarse-grained matrix accelerator is integrated to expedite various matrix operations ensuring high throughput. In this way, both high flexibility and computational efficiency are achieved. Finally, our processor is implemented on a TSMC 28-nm technology. Implementation results show that the processor achieves a speedup of <inline-formula> <tex-math>$5.16\\sim 6.80\\times $ </tex-math></inline-formula> for all matrix operations compared with the state-of-the-art work. Moreover, the proposed processor provides an area efficiency improvement of <inline-formula> <tex-math>$1.61\\times $ </tex-math></inline-formula> and an energy efficiency enhancement of <inline-formula> <tex-math>$6.6\\sim 15.4\\times $ </tex-math></inline-formula> compared to the open-source vector processor Ara. Notably, this work is the first RISC-V domain-specific processor tailored for diverse DL-based CE algorithms.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 5","pages":"2380-2393"},"PeriodicalIF":5.2000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems I: Regular Papers","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10939000/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Channel estimation (CE) is a critical component in the massive multi-input multi-output (MIMO) communication systems. Compared with conventional CE algorithms, deep learning (DL)-based approach becomes a promising alternative, due to its capability of offering enhanced performance and robustness across diverse scenarios. However, efficient DL-based CE algorithms have two key properties that make them challenging for implementation in existing architectures at the edge side: the diversity of deep neural networks (DNNs) and CE strategies, and the involvements of multiple computation-intensive tasks that compass conventional signal processing, artificial intelligence (AI) inference, and online learning. To address these challenges, a domain-specific processor based on an extended RISC-V instruction set architecture (ISA) is proposed to perform these DL-based CE algorithms. First, a dedicated RISC-V ISA extension is developed to support all essential operations required by a DL-based CE algorithm, such as matrix inversion, in a flexible manner. Building on the customized ISA extension, a highly adaptable and scalable RISC-V processor is developed, featuring scalar and vector posit arithmetic units to alleviate high computational and memory demands of DNNs during both inference and training phase. Additionally, a coarse-grained matrix accelerator is integrated to expedite various matrix operations ensuring high throughput. In this way, both high flexibility and computational efficiency are achieved. Finally, our processor is implemented on a TSMC 28-nm technology. Implementation results show that the processor achieves a speedup of

$5.16\sim 6.80\times $

for all matrix operations compared with the state-of-the-art work. Moreover, the proposed processor provides an area efficiency improvement of

$1.61\times $

and an energy efficiency enhancement of

$6.6\sim 15.4\times $

compared to the open-source vector processor Ara. Notably, this work is the first RISC-V domain-specific processor tailored for diverse DL-based CE algorithms.

查看原文本刊更多论文

基于深度学习的信道估计的RISC-V专用处理器

信道估计是大规模多输入多输出（MIMO）通信系统中的一个重要组成部分。与传统的CE算法相比，基于深度学习（DL）的方法成为一种很有前途的替代方案，因为它能够在不同的场景中提供更高的性能和鲁棒性。然而，高效的基于dl的CE算法有两个关键特性，这使得它们在边缘侧的现有架构中实现具有挑战性：深度神经网络（dnn）和CE策略的多样性，以及涉及多个计算密集型任务的传统信号处理，人工智能（AI）推理和在线学习。为了解决这些挑战，提出了一种基于扩展RISC-V指令集架构（ISA）的特定领域处理器来执行这些基于dl的CE算法。首先，开发了专用的RISC-V ISA扩展，以灵活的方式支持基于dl的CE算法所需的所有基本操作，例如矩阵反演。基于定制的ISA扩展，开发了一种高适应性和可扩展的RISC-V处理器，具有标量和矢量定位算术单元，以减轻dnn在推理和训练阶段的高计算和内存需求。此外，还集成了一个粗粒度矩阵加速器，以加快各种矩阵操作，确保高吞吐量。这种方法既具有较高的灵活性，又具有较高的计算效率。最后，我们的处理器采用了台积电28纳米技术。实现结果表明，与最先进的工作相比，该处理器在所有矩阵运算中实现了5.16 / 6.80 /倍的加速。此外，与开源矢量处理器Ara相比，该处理器的面积效率提高了1.61倍，能效提高了6.6倍，达到15.4倍。值得注意的是，这项工作是为各种基于dl的CE算法量身定制的第一个RISC-V领域特定处理器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Circuits and Systems I: Regular Papers 工程技术-工程：电子与电气

CiteScore

9.80

自引率

11.80%

发文量

441

审稿时长

2 months

期刊介绍： TCAS I publishes regular papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: - Circuits: Analog, Digital and Mixed Signal Circuits and Systems - Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic - Circuits and Systems, Power Electronics and Systems - Software for Analog-and-Logic Circuits and Systems - Control aspects of Circuits and Systems.