FPGA-Accelerated CNN Reconstruction for Low-Power Sparse-Array Ultrasound Imaging

IF 3.7 2区工程技术 Q1 ACOUSTICS

IEEE transactions on ultrasonics, ferroelectrics, and frequency control Pub Date : 2025-11-07 DOI:10.1109/TUFFC.2025.3630483

Rouzbeh Molaei Imenabadi;Gregory R. Thoreson;Katherine G. Brown;Dinesh Bhatia

{"title":"FPGA-Accelerated CNN Reconstruction for Low-Power Sparse-Array Ultrasound Imaging","authors":"Rouzbeh Molaei Imenabadi;Gregory R. Thoreson;Katherine G. Brown;Dinesh Bhatia","doi":"10.1109/TUFFC.2025.3630483","DOIUrl":null,"url":null,"abstract":"Imaging of targeted organs, such as the urinary bladder, could be transformative for preventive healthcare and early disease diagnosis when used to assess their real-time function. However, wearable and portable ultrasound (US) imaging systems often face constraints related to power consumption, form factor, cost, and signal resolution, particularly for deep tissues like the bladder. High-accuracy platforms with large channel counts can generate data streams of up to 10 GB/s, posing significant challenges in reducing computational complexity, achieving power efficiency, and maintaining wireless connectivity. Recent advancements in wearable US sensors have demonstrated potential for low-power, unobtrusive solutions but often fail to meet the accuracy and efficiency needed in clinical settings. This work presents an algorithm-centric proof of concept that reconstructs missing US channels through field-programmable gate array (FPGA)-accelerated deep learning, effectively doubling the imaging aperture while halving analog front-end requirements. We developed a lightweight U-Net convolutional neural network (L-UNET) with 222 609 parameters, specifically optimized for sparse-array RF data reconstruction. The network is deployed on a deep learning processing unit (DPU) using mixed quantization-aware training (Mixed-QAT) that selectively applies 8-bit integer precision while preserving two critical layers at 16-bit floating point (FP), achieving mean-squared error (MSE) of 1.48 <inline-formula> <tex-math>$\\times$ </tex-math></inline-formula> 10 compared to 1.22 <inline-formula> <tex-math>$\\times$ </tex-math></inline-formula> 10 for 32-bit FP. The FPGA implementation leverages a single-core accelerator, executing inference in 221 ms/frame with deterministic latency suitable for real-time reconstruction. By processing only odd-indexed physical channels and inferring even-indexed channels through the convolutional neural network (CNN), our approach maintains B-mode image quality (peak signal-to-noise ratio (PSNR) >18 dB and structural similarity index (SSIM) > 0.5) while reducing data acquisition complexity. The system achieves 0.918-W average power consumption in a 32-channel configuration, demonstrating that CNNbased sparse-array reconstruction on embedded FPGAs offers a viable path toward fully integrated US monitoring systems.","PeriodicalId":13322,"journal":{"name":"IEEE transactions on ultrasonics, ferroelectrics, and frequency control","volume":"72 12","pages":"1618-1636"},"PeriodicalIF":3.7000,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on ultrasonics, ferroelectrics, and frequency control","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11234914/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Imaging of targeted organs, such as the urinary bladder, could be transformative for preventive healthcare and early disease diagnosis when used to assess their real-time function. However, wearable and portable ultrasound (US) imaging systems often face constraints related to power consumption, form factor, cost, and signal resolution, particularly for deep tissues like the bladder. High-accuracy platforms with large channel counts can generate data streams of up to 10 GB/s, posing significant challenges in reducing computational complexity, achieving power efficiency, and maintaining wireless connectivity. Recent advancements in wearable US sensors have demonstrated potential for low-power, unobtrusive solutions but often fail to meet the accuracy and efficiency needed in clinical settings. This work presents an algorithm-centric proof of concept that reconstructs missing US channels through field-programmable gate array (FPGA)-accelerated deep learning, effectively doubling the imaging aperture while halving analog front-end requirements. We developed a lightweight U-Net convolutional neural network (L-UNET) with 222 609 parameters, specifically optimized for sparse-array RF data reconstruction. The network is deployed on a deep learning processing unit (DPU) using mixed quantization-aware training (Mixed-QAT) that selectively applies 8-bit integer precision while preserving two critical layers at 16-bit floating point (FP), achieving mean-squared error (MSE) of 1.48

$\times$

10 compared to 1.22

$\times$

10 for 32-bit FP. The FPGA implementation leverages a single-core accelerator, executing inference in 221 ms/frame with deterministic latency suitable for real-time reconstruction. By processing only odd-indexed physical channels and inferring even-indexed channels through the convolutional neural network (CNN), our approach maintains B-mode image quality (peak signal-to-noise ratio (PSNR) >18 dB and structural similarity index (SSIM) > 0.5) while reducing data acquisition complexity. The system achieves 0.918-W average power consumption in a 32-channel configuration, demonstrating that CNNbased sparse-array reconstruction on embedded FPGAs offers a viable path toward fully integrated US monitoring systems.

查看原文本刊更多论文

低功耗稀疏阵列超声成像的fpga加速CNN重构。

目标器官（如膀胱）的成像在用于评估其实时功能时，可能对预防保健和早期疾病诊断具有革命性意义。然而，可穿戴和便携式超声成像系统经常面临与功耗、外形因素、成本和信号分辨率相关的限制，特别是对于像膀胱这样的深层组织。具有大信道计数的高精度平台可以生成每秒高达10gb的数据流，这在降低计算复杂性、实现功率效率和维护无线连接方面提出了重大挑战。可穿戴超声传感器的最新进展已经证明了低功耗、不引人注目的解决方案的潜力，但往往无法满足临床环境所需的准确性和效率。这项工作提出了一种以算法为中心的概念验证，通过现场可编程门阵列（FPGA）加速深度学习重建缺失的超声通道，有效地将成像孔径加倍，同时将模拟前端要求减半。我们开发了一个轻量级的U-Net卷积神经网络（L-UNET），包含222,609个参数，专门针对稀疏阵列射频数据重建进行了优化。该网络部署在深度学习处理单元（DPU）上，使用混合量化感知训练（mixed - qat），选择性地应用8位整数精度，同时在16位浮点位置保留两个关键层，与32位浮点位置的1.22×10相比，均方误差（MSE）为1.48×10。FPGA实现利用单核加速器，以每帧221毫秒的速度执行推理，具有适合实时重建的确定性延迟。该方法仅处理奇数索引的物理信道，并通过CNN推断偶数索引的信道，在降低数据采集复杂性的同时保持了b模式图像质量(峰值信噪比（PSNR) >18 dB，结构相似性指数(SSIM) >0.5）。该系统在32通道配置下实现了0.918 W的平均功耗，表明基于cnn的嵌入式fpga稀疏阵列重构为实现全集成超声监测系统提供了一条可行的途径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on ultrasonics, ferroelectrics, and frequency control 工程技术-工程：电子与电气

CiteScore

7.70

自引率

16.70%

发文量

583

审稿时长

4.5 months

期刊介绍： IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control includes the theory, technology, materials, and applications relating to: (1) the generation, transmission, and detection of ultrasonic waves and related phenomena; (2) medical ultrasound, including hyperthermia, bioeffects, tissue characterization and imaging; (3) ferroelectric, piezoelectric, and piezomagnetic materials, including crystals, polycrystalline solids, films, polymers, and composites; (4) frequency control, timing and time distribution, including crystal oscillators and other means of classical frequency control, and atomic, molecular and laser frequency control standards. Areas of interest range from fundamental studies to the design and/or applications of devices and systems.