CapsBeam: Accelerating Capsule Network-Based Beamformer for Ultrasound Nonsteered Plane-Wave Imaging on Field-Programmable Gate Array

IF 3.1 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-04-25 DOI:10.1109/TVLSI.2025.3559403

Abdul Rahoof;Vivek Chaturvedi;Mahesh Raveendranatha Panicker;Muhammad Shafique

{"title":"CapsBeam: Accelerating Capsule Network-Based Beamformer for Ultrasound Nonsteered Plane-Wave Imaging on Field-Programmable Gate Array","authors":"Abdul Rahoof;Vivek Chaturvedi;Mahesh Raveendranatha Panicker;Muhammad Shafique","doi":"10.1109/TVLSI.2025.3559403","DOIUrl":null,"url":null,"abstract":"In recent years, there has been a growing trend in accelerating computationally complex nonreal-time beamforming algorithms in ultrasound imaging using deep learning models. However, due to the large size and complexity, these state-of-the-art deep learning techniques pose significant challenges when deploying on resource-constrained edge devices. In this work, we propose a novel capsule network-based beamformer called CapsBeam, designed to operate on raw radio frequency data and provide an envelope of beamformed data through nonsteered plane-wave insonification. In experiments on in vivo data, CapsBeam reduced artifacts compared to the standard Delay-and-Sum (DAS) beamforming. For in vitro data, CapsBeam demonstrated a 32.31% increase in contrast, along with gains of 16.54% and 6.7% in axial and lateral resolution compared to the DAS. Similarly, in silico data showed a 26% enhancement in contrast, along with improvements of 13.6% and 21.5% in axial and lateral resolution, respectively, compared to the DAS. To reduce the parameter redundancy and enhance the computational efficiency, we pruned the model using our multilayer look-ahead kernel pruning (LAKP-ML) methodology, achieving a compression ratio of 85% without affecting the image quality. Additionally, the hardware complexity of the proposed model is reduced by applying quantization, simplification of nonlinear operations, and parallelizing operations. Finally, we proposed a specialized accelerator architecture for the pruned and optimized CapsBeam model, implemented on a Xilinx ZU7EV FPGA. The proposed accelerator achieved a throughput of 30 GOPS for the convolution operation and 17.4 GOPS for the dynamic routing operation.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1934-1944"},"PeriodicalIF":3.1000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10977768/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, there has been a growing trend in accelerating computationally complex nonreal-time beamforming algorithms in ultrasound imaging using deep learning models. However, due to the large size and complexity, these state-of-the-art deep learning techniques pose significant challenges when deploying on resource-constrained edge devices. In this work, we propose a novel capsule network-based beamformer called CapsBeam, designed to operate on raw radio frequency data and provide an envelope of beamformed data through nonsteered plane-wave insonification. In experiments on in vivo data, CapsBeam reduced artifacts compared to the standard Delay-and-Sum (DAS) beamforming. For in vitro data, CapsBeam demonstrated a 32.31% increase in contrast, along with gains of 16.54% and 6.7% in axial and lateral resolution compared to the DAS. Similarly, in silico data showed a 26% enhancement in contrast, along with improvements of 13.6% and 21.5% in axial and lateral resolution, respectively, compared to the DAS. To reduce the parameter redundancy and enhance the computational efficiency, we pruned the model using our multilayer look-ahead kernel pruning (LAKP-ML) methodology, achieving a compression ratio of 85% without affecting the image quality. Additionally, the hardware complexity of the proposed model is reduced by applying quantization, simplification of nonlinear operations, and parallelizing operations. Finally, we proposed a specialized accelerator architecture for the pruned and optimized CapsBeam model, implemented on a Xilinx ZU7EV FPGA. The proposed accelerator achieved a throughput of 30 GOPS for the convolution operation and 17.4 GOPS for the dynamic routing operation.

查看原文本刊更多论文

CapsBeam：用于现场可编程门阵列超声无操纵平面波成像的加速胶囊网络波束形成器

近年来，利用深度学习模型加速超声成像中计算复杂的非实时波束形成算法已成为一种发展趋势。然而，由于规模大和复杂性，这些最先进的深度学习技术在资源受限的边缘设备上部署时会带来重大挑战。在这项工作中，我们提出了一种新型的基于胶囊网络的波束形成器，称为CapsBeam，旨在对原始射频数据进行操作，并通过非操纵平面波不相干提供波束形成数据的包络。在体内数据实验中，与标准的延迟和和（DAS）波束形成相比，CapsBeam减少了伪影。对于体外数据，与DAS相比，CapsBeam的轴向和横向分辨率分别提高了16.54%和6.7%，相比之下，CapsBeam的对比度提高了32.31%。同样，与DAS相比，计算机数据显示对比度增强了26%，轴向和横向分辨率分别提高了13.6%和21.5%。为了减少参数冗余并提高计算效率，我们使用多层前瞻性核修剪（LAKP-ML）方法对模型进行修剪，在不影响图像质量的情况下实现了85%的压缩比。此外，通过量化、简化非线性运算和并行运算，降低了模型的硬件复杂度。最后，我们提出了一个专门的加速器架构，用于修剪和优化CapsBeam模型，并在Xilinx ZU7EV FPGA上实现。该加速器的卷积运算吞吐量为30 GOPS，动态路由运算吞吐量为17.4 GOPS。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Very Large Scale Integration (VLSI) Systems 工程技术-工程：电子与电气

CiteScore

6.40

自引率

7.10%

发文量

187

审稿时长

3.6 months

期刊介绍： The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society. Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels. To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.