Li He;Ruitao Chen;Xiangyu Liu;Xu Cao;Shouping Zhu;Yihan Wang
{"title":"PACformer: A Multi-Stage Heterogeneous Convolutional-Vision Transformer for Sparse-View Photoacoustic Tomography Restoration","authors":"Li He;Ruitao Chen;Xiangyu Liu;Xu Cao;Shouping Zhu;Yihan Wang","doi":"10.1109/TCI.2025.3550716","DOIUrl":null,"url":null,"abstract":"Sparse sampling of photoacoustic (PA) signals is a crucial strategy for enhancing the feasibility of photoacoustic tomography (PAT) in clinical settings by reducing system complexity and costs. However, this approach often faces significant artifacts resulting from traditional reconstruction algorithms, underscoring the urgent need for effective solutions. To address the critical challenge of balancing computational efficiency with imaging quality, we introduce PACformer—a novel hybrid model that integrates convolutional neural networks (CNNs) with multi-head self-attentions (MSAs) to improve the reconstruction of sparse-view PAT images. While conventional CNNs excel at local feature extraction, they often struggle to capture long-range dependencies inherent in continuous structures and the diverse artifact patterns present in PAT images. PACformer tackles these limitations through a dual architecture that seamlessly combines MSAs with heterogeneous convolutional layers. Since feature representations differ in size and semantics at various stages of the deep model, PACformer employs specialized blocks for shallow and deep stages. Specifically, it utilizes efficient local convolutions and windowed MSAs for high-resolution feature maps, conditional convolutions (CondConv) integrated with MSAs for advanced feature representation in deeper stages, and Scale-Modulated Convolution combined with CondConv for the bottleneck stage. Experimental results on open-source datasets demonstrate PACformer's superior performance compared to traditional and state-of-the-art networks, validated through ablation studies and attention map visualizations. By effectively modeling both local and global artifacts, PACformer establishes itself as a robust solution for sparse-view PAT reconstruction.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"11 ","pages":"377-388"},"PeriodicalIF":4.2000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Imaging","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10945768/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Sparse sampling of photoacoustic (PA) signals is a crucial strategy for enhancing the feasibility of photoacoustic tomography (PAT) in clinical settings by reducing system complexity and costs. However, this approach often faces significant artifacts resulting from traditional reconstruction algorithms, underscoring the urgent need for effective solutions. To address the critical challenge of balancing computational efficiency with imaging quality, we introduce PACformer—a novel hybrid model that integrates convolutional neural networks (CNNs) with multi-head self-attentions (MSAs) to improve the reconstruction of sparse-view PAT images. While conventional CNNs excel at local feature extraction, they often struggle to capture long-range dependencies inherent in continuous structures and the diverse artifact patterns present in PAT images. PACformer tackles these limitations through a dual architecture that seamlessly combines MSAs with heterogeneous convolutional layers. Since feature representations differ in size and semantics at various stages of the deep model, PACformer employs specialized blocks for shallow and deep stages. Specifically, it utilizes efficient local convolutions and windowed MSAs for high-resolution feature maps, conditional convolutions (CondConv) integrated with MSAs for advanced feature representation in deeper stages, and Scale-Modulated Convolution combined with CondConv for the bottleneck stage. Experimental results on open-source datasets demonstrate PACformer's superior performance compared to traditional and state-of-the-art networks, validated through ablation studies and attention map visualizations. By effectively modeling both local and global artifacts, PACformer establishes itself as a robust solution for sparse-view PAT reconstruction.
期刊介绍:
The IEEE Transactions on Computational Imaging will publish articles where computation plays an integral role in the image formation process. Papers will cover all areas of computational imaging ranging from fundamental theoretical methods to the latest innovative computational imaging system designs. Topics of interest will include advanced algorithms and mathematical techniques, model-based data inversion, methods for image and signal recovery from sparse and incomplete data, techniques for non-traditional sensing of image data, methods for dynamic information acquisition and extraction from imaging sensors, software and hardware for efficient computation in imaging systems, and highly novel imaging system design.