PACformer: A Multi-Stage Heterogeneous Convolutional-Vision Transformer for Sparse-View Photoacoustic Tomography Restoration

IF 4.8 2区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Computational Imaging Pub Date : 2025-03-31 DOI:10.1109/TCI.2025.3550716

Li He;Ruitao Chen;Xiangyu Liu;Xu Cao;Shouping Zhu;Yihan Wang

{"title":"PACformer: A Multi-Stage Heterogeneous Convolutional-Vision Transformer for Sparse-View Photoacoustic Tomography Restoration","authors":"Li He;Ruitao Chen;Xiangyu Liu;Xu Cao;Shouping Zhu;Yihan Wang","doi":"10.1109/TCI.2025.3550716","DOIUrl":null,"url":null,"abstract":"Sparse sampling of photoacoustic (PA) signals is a crucial strategy for enhancing the feasibility of photoacoustic tomography (PAT) in clinical settings by reducing system complexity and costs. However, this approach often faces significant artifacts resulting from traditional reconstruction algorithms, underscoring the urgent need for effective solutions. To address the critical challenge of balancing computational efficiency with imaging quality, we introduce PACformer—a novel hybrid model that integrates convolutional neural networks (CNNs) with multi-head self-attentions (MSAs) to improve the reconstruction of sparse-view PAT images. While conventional CNNs excel at local feature extraction, they often struggle to capture long-range dependencies inherent in continuous structures and the diverse artifact patterns present in PAT images. PACformer tackles these limitations through a dual architecture that seamlessly combines MSAs with heterogeneous convolutional layers. Since feature representations differ in size and semantics at various stages of the deep model, PACformer employs specialized blocks for shallow and deep stages. Specifically, it utilizes efficient local convolutions and windowed MSAs for high-resolution feature maps, conditional convolutions (CondConv) integrated with MSAs for advanced feature representation in deeper stages, and Scale-Modulated Convolution combined with CondConv for the bottleneck stage. Experimental results on open-source datasets demonstrate PACformer's superior performance compared to traditional and state-of-the-art networks, validated through ablation studies and attention map visualizations. By effectively modeling both local and global artifacts, PACformer establishes itself as a robust solution for sparse-view PAT reconstruction.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"11 ","pages":"377-388"},"PeriodicalIF":4.8000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Imaging","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10945768/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Sparse sampling of photoacoustic (PA) signals is a crucial strategy for enhancing the feasibility of photoacoustic tomography (PAT) in clinical settings by reducing system complexity and costs. However, this approach often faces significant artifacts resulting from traditional reconstruction algorithms, underscoring the urgent need for effective solutions. To address the critical challenge of balancing computational efficiency with imaging quality, we introduce PACformer—a novel hybrid model that integrates convolutional neural networks (CNNs) with multi-head self-attentions (MSAs) to improve the reconstruction of sparse-view PAT images. While conventional CNNs excel at local feature extraction, they often struggle to capture long-range dependencies inherent in continuous structures and the diverse artifact patterns present in PAT images. PACformer tackles these limitations through a dual architecture that seamlessly combines MSAs with heterogeneous convolutional layers. Since feature representations differ in size and semantics at various stages of the deep model, PACformer employs specialized blocks for shallow and deep stages. Specifically, it utilizes efficient local convolutions and windowed MSAs for high-resolution feature maps, conditional convolutions (CondConv) integrated with MSAs for advanced feature representation in deeper stages, and Scale-Modulated Convolution combined with CondConv for the bottleneck stage. Experimental results on open-source datasets demonstrate PACformer's superior performance compared to traditional and state-of-the-art networks, validated through ablation studies and attention map visualizations. By effectively modeling both local and global artifacts, PACformer establishes itself as a robust solution for sparse-view PAT reconstruction.

查看原文本刊更多论文

PACformer：一种用于稀疏视场光声层析成像恢复的多级异构卷积视觉变压器

光声（PA）信号的稀疏采样是通过降低系统复杂性和成本来提高临床环境中光声断层扫描（PAT）可行性的关键策略。然而，这种方法经常面临传统重建算法产生的重大伪影，迫切需要有效的解决方案。为了解决平衡计算效率和成像质量的关键挑战，我们引入了一种新的混合模型pacformer，该模型将卷积神经网络（cnn）与多头自关注（msa）相结合，以改善稀疏视图PAT图像的重建。虽然传统的cnn擅长于局部特征提取，但它们往往难以捕获连续结构固有的长期依赖关系和PAT图像中存在的各种伪像模式。PACformer通过将msa与异构卷积层无缝结合的双重架构解决了这些限制。由于特征表示在深度模型的各个阶段在大小和语义上有所不同，PACformer为浅阶段和深阶段使用专门的块。具体来说，它利用高效的局部卷积和带窗的msa来实现高分辨率的特征映射，将条件卷积（CondConv）与msa相结合来实现更深阶段的高级特征表示，并将尺度调制卷积与CondConv相结合来实现瓶颈阶段。在开源数据集上的实验结果表明，与传统和最先进的网络相比，PACformer具有优越的性能，并通过消融研究和注意力图可视化进行了验证。通过有效地对局部和全局工件进行建模，PACformer将自己建立为稀疏视图PAT重建的健壮解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Computational Imaging Mathematics-Computational Mathematics

CiteScore

8.20

自引率

7.40%

发文量

期刊介绍： The IEEE Transactions on Computational Imaging will publish articles where computation plays an integral role in the image formation process. Papers will cover all areas of computational imaging ranging from fundamental theoretical methods to the latest innovative computational imaging system designs. Topics of interest will include advanced algorithms and mathematical techniques, model-based data inversion, methods for image and signal recovery from sparse and incomplete data, techniques for non-traditional sensing of image data, methods for dynamic information acquisition and extraction from imaging sensors, software and hardware for efficient computation in imaging systems, and highly novel imaging system design.