{"title":"AVP-AP: Self-Supervised Automatic View Positioning in 3D Cardiac CT via Atlas Prompting","authors":"Xiaolin Fan;Yan Wang;Yingying Zhang;Mingkun Bao;Bosen Jia;Dong Lu;Yifan Gu;Jian Cheng;Haogang Zhu","doi":"10.1109/TMI.2025.3554785","DOIUrl":"10.1109/TMI.2025.3554785","url":null,"abstract":"Automatic view positioning is crucial for cardiac computed tomography (CT) examinations, including disease diagnosis and surgical planning. However, it is highly challenging due to individual variability and large 3D search space. Existing work needs labor-intensive and time-consuming manual annotations to train view-specific models, which are limited to predicting only a fixed set of planes. However, in real clinical scenarios, the challenge of positioning semantic 2D slices with any orientation into varying coordinate space in arbitrary 3D volume remains unsolved. We thus introduce a novel framework, AVP-AP, the first to use Atlas Prompting for self-supervised Automatic View Positioning in the 3D CT volume. Specifically, this paper first proposes an atlas prompting method, which generates a 3D canonical atlas and trains a network to map slices into their corresponding positions in the atlas space via a self-supervised manner. Then, guided by atlas prompts corresponding to the given query images in a reference CT, we identify the coarse positions of slices in the target CT volume using rigid transformation between the 3D atlas and target CT volume, effectively reducing the search space. Finally, we refine the coarse positions by maximizing the similarity between the predicted slices and the query images in the feature space of a given foundation model. Our framework is flexible and efficient compared to other methods, outperforming other methods by 19.8% average structural similarity (SSIM) in arbitrary view positioning and achieving 9% SSIM in two-chamber view compared to four radiologists. Meanwhile, experiments on a public dataset validate our framework’s generalizability.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 7","pages":"2921-2932"},"PeriodicalIF":0.0,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143713037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gyeong Hun Kim;Seongjin Bak;Hyung-Hoi Kim;Jun Geun Shin;Tae Joong Eom;Chang-Seok Kim;Hwidon Lee
{"title":"Phase-Locked Time-Stretch Optical Coherence Tomography for Contrast-Enhanced Retinal Microangiography","authors":"Gyeong Hun Kim;Seongjin Bak;Hyung-Hoi Kim;Jun Geun Shin;Tae Joong Eom;Chang-Seok Kim;Hwidon Lee","doi":"10.1109/TMI.2025.3555112","DOIUrl":"10.1109/TMI.2025.3555112","url":null,"abstract":"Optical coherence tomography angiography has transformed retinal vascular imaging by providing non-invasive, high-resolution visualization. However, achieving an optimal balance between field of view, resolution, and three-dimensional microvasculature contrast, particularly in deeper retinal layers, remains challenging. A phase-locked time-stretch optical coherence tomography microangiography system is developed to address these limitations with a 5-MHz A-line rate and sub-nm phase sensitivity. Utilizing a dual chirped fiber Bragg grating architecture, the swept-source laser achieves an extended coherence length of ~10 mm and a 102-nm bandwidth. A time-stretch analog-to-digital converter overcomes the limitations of conventional multi-MHz optical coherence tomography systems, ensuring a 2-mm imaging depth in the air with high spatial resolution. The proposed system enables high-contrast, depth-encoded mapping of key retinal structures, including the superficial and deep capillary plexuses and the choriocapillaris. Compared to a state-of-the-art system, the proposed approach demonstrates enhanced resolution, improved contrast, and faster imaging speeds, enhancing its potential for diagnosing and monitoring retinal and systemic diseases like age-related macular degeneration, diabetic retinopathy, and Alzheimer’s disease.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 7","pages":"2906-2920"},"PeriodicalIF":0.0,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10942462","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143713038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatio-Temporal and Retrieval-Augmented Modeling for Chest X-Ray Report Generation","authors":"Yan Yang;Xiaoxing You;Ke Zhang;Zhenqi Fu;Xianyun Wang;Jiajun Ding;Jiamei Sun;Zhou Yu;Qingming Huang;Weidong Han;Jun Yu","doi":"10.1109/TMI.2025.3554498","DOIUrl":"10.1109/TMI.2025.3554498","url":null,"abstract":"Chest X-ray report generation has attracted increasing research attention. However, most existing methods neglect the temporal information and typically generate reports conditioned on a fixed number of images. In this paper, we propose STREAM: Spatio-Temporal and REtrieval-Augmented Modelling for automatic chest X-ray report generation. It mimics clinical diagnosis by integrating current and historical studies to interpret the present condition (temporal), with each study containing images from multi-views (spatial). Concretely, our STREAM is built upon an encoder-decoder architecture, utilizing a large language model (LLM) as the decoder. Overall, spatio-temporal visual dynamics are packed as visual prompts and regional semantic entities are retrieved as textual prompts. First, a token packer is proposed to capture condensed spatio-temporal visual dynamics, enabling the flexible fusion of images from current and historical studies. Second, to augment the generation with existing knowledge and regional details, a progressive semantic retriever is proposed to retrieve semantic entities from a preconstructed knowledge bank as heuristic text prompts. The knowledge bank is constructed to encapsulate anatomical chest X-ray knowledge into structured entities, each linked to a specific chest region. Extensive experiments on public datasets have shown the state-of-the-art performance of our method. Related codes and the knowledge bank are available at <uri>https://github.com/yangyan22/STREAM</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 7","pages":"2892-2905"},"PeriodicalIF":0.0,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143712667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xu Tang;Jiangbo Chen;Zheng Qu;Jingyi Zhu;Mohammadreza Amjadian;Mingxuan Yang;Yingpeng Wan;Lidai Wang
{"title":"High Sensitivity Photoacoustic Imaging by Learning From Noisy Data","authors":"Xu Tang;Jiangbo Chen;Zheng Qu;Jingyi Zhu;Mohammadreza Amjadian;Mingxuan Yang;Yingpeng Wan;Lidai Wang","doi":"10.1109/TMI.2025.3552692","DOIUrl":"10.1109/TMI.2025.3552692","url":null,"abstract":"Photoacoustic imaging (PAI) is a high-resolution biomedical imaging technology for the non-invasive detection of a broad range of chromophores at multiple scales and depths. However, limited by low chromophore concentration, weak signals in deep tissue, or various noise, the signal-to-noise ratio of photoacoustic images may be compromised in many biomedical applications. Although improvements in hardware and computational methods have been made to address this problem, they have not been readily available due to either high costs or an inability to generalize across different datasets. Here, we present a self-supervised deep learning method to increase the signal-to-noise ratio of photoacoustic images using noisy data only. Because this method does not require expensive ground truth data for training, it can be easily implemented across scanning microscopic and computed tomographic data acquired with various photoacoustic imaging systems. In vivo results show that our method makes the vascular details that were completely submerged in noise become clearly visible, increases the signal-to-noise ratio by up to 12-fold, doubles the imaging depth, and enables high-contrast imaging of deep tumors. We believe this method can be readily applied to many preclinical and clinical applications.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 7","pages":"2868-2877"},"PeriodicalIF":0.0,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143661336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Replace2Self: Self-Supervised Denoising Based on Voxel Replacing and Image Mixing for Diffusion MRI","authors":"Linhai Wu;Lihui Wang;Zeyu Deng;Yuemin Zhu;Hongjiang Wei","doi":"10.1109/TMI.2025.3552611","DOIUrl":"10.1109/TMI.2025.3552611","url":null,"abstract":"Low signal to noise ratio (SNR) remains one of the limitations of diffusion weighted (DW) imaging. How to suppress the influence of noise on the subsequent analysis about the tissue microstructure is still challenging. This work proposed a novel self-supervised learning model, Replace2Self, to effectively reduce spatial correlated noise in DW images. Specifically, a voxel replacement strategy based on similar block matching in Q-space was proposed to destroy the correlations of noise in DW image along one diffusion gradient direction. To alleviate the signal gap caused by the voxel replacement, an image mixing strategy based on complementary mask was designed to generate two different noisy DW images. After that, these two noisy DW images were taken as input, and the non-correlated noisy DW image after voxel replacement was taken as learning target, a denoising network was trained for denoising. To promote the denoising performance, a complementary mask mixing consistency loss and an inverse replacement regularization loss were also proposed. Through the comparisons against several existing DW image denoising methods on extensive simulation data with different noise distributions, noise levels and b-values, as well as the acquisition datasets and the ablation experiments, we verified the effectiveness of the proposed method. Regardless of the noise distribution and noise level, the proposed method achieved the highest PSNR, which was at least 1.9% higher than the suboptimal method when the noise level reaches 10%. Furthermore, our method has superior generalization ability due to the use of the proposed strategies.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 7","pages":"2878-2891"},"PeriodicalIF":0.0,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143660090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chuhua Wu;Hongzhi Zuo;Manxiu Cui;Handi Deng;Yuwen Chen;Xuanhao Wang;Bangyan Wang;Cheng Ma
{"title":"Blood Oxygenation Quantification in Multispectral Photoacoustic Tomography Using a Convex Cone Approach","authors":"Chuhua Wu;Hongzhi Zuo;Manxiu Cui;Handi Deng;Yuwen Chen;Xuanhao Wang;Bangyan Wang;Cheng Ma","doi":"10.1109/TMI.2025.3551744","DOIUrl":"10.1109/TMI.2025.3551744","url":null,"abstract":"Multispectral photoacoustic tomography (PAT) can create high spatial and temporal resolution images of oxygen saturation (sO2) distribution in deep tissue. However, unknown distributions of photon absorption and scattering introduces complex modulations to the photoacoustic (PA) spectra, dramatically reducing the accuracy of sO2 quantification. In this study, a rigorous light transport model was employed to unveil that the PA spectra corresponding to distinct sO2 values can be constrained within separate convex cones (CCs). Based on the CC model, sO2 estimation is achieved by identifying the CC nearest to the measured data through a modified Gilbert-Johnson-Keerthi (GJK) algorithm. The CC method combines a rigorous physical model with data-driven approach, and shows outstanding robustness in numerical, phantom, and in vivo imaging experiments validated against ground truth measurements. The average sO2 estimation error is approximately only 3% in in vivo human experiments, underscoring its potential for clinical application. All of our computer codes and data are publicly available on GitHub.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 7","pages":"2842-2853"},"PeriodicalIF":0.0,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143652826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DenseFormer-MoE: A Dense Transformer Foundation Model with Mixture of Experts for Multi-Task Brain Image Analysis.","authors":"Rizhi Ding, Hui Lu, Manhua Liu","doi":"10.1109/TMI.2025.3551514","DOIUrl":"10.1109/TMI.2025.3551514","url":null,"abstract":"<p><p>Deep learning models have been widely investigated for computing and analyzing brain images across various downstream tasks such as disease diagnosis and age regression. Most existing models are tailored for specific tasks and diseases, posing a challenge in developing a foundation model for diverse tasks. This paper proposes a Dense Transformer Foundation Model with Mixture of Experts (DenseFormer-MoE), which integrates dense convolutional network, Vision Transformer and Mixture of Experts (MoE) to progressively learn and consolidate local and global features from T1-weighted magnetic resonance images (sMRI) for multiple tasks including diagnosing multiple brain diseases and predicting brain age. First, a foundation model is built by combining the vision Transformer with Densenet, which are pre-trained with Masked Autoencoder and self-supervised learning to enhance the generalization of feature representations. Then, to mitigate optimization conflicts in multi-task learning, MoE is designed to dynamically select the most appropriate experts for each task. Finally, our method is evaluated on multiple renowned brain imaging datasets including UK Biobank (UKB), Alzheimer's Disease Neuroimaging Initiative (ADNI), and Parkinson's Progression Markers Initiative (PPMI). Experimental results and comparison demonstrate that our method achieves promising performances for prediction of brain age and diagnosis of brain diseases.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143631136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Metin Calis;Massimo Mischi;Alle-Jan van der Veen;Borbala Hunyadi
{"title":"Speckle Denoising of Dynamic Contrast- Enhanced Ultrasound Using Low-Rank Tensor Decomposition","authors":"Metin Calis;Massimo Mischi;Alle-Jan van der Veen;Borbala Hunyadi","doi":"10.1109/TMI.2025.3551660","DOIUrl":"10.1109/TMI.2025.3551660","url":null,"abstract":"Dynamic contrast-enhanced ultrasound (DCEUS) is an imaging modality for assessing micro- vascular perfusion and dispersion kinetics. However, the presence of speckle noise may hamper the quantitative analysis of the contrast kinetics. Common speckle denoising techniques based on low-rank approximations typically model the speckle noise as white Gaussian noise (WGN) after the log transformation and apply matrix-based algorithms. We address the high dimensionality of the 4D DCEUS data and apply low-rank tensor decomposition techniques to denoise speckles. Although there are many tensor decompositions that can describe low rankness, we limit our research to multilinear rank and tubal rank. We introduce a gradient-based extension of the multilinear singular value decomposition to model low multilinear rankness, assuming that the log-transformed speckle noise follows a Fisher-tippet distribution. In addition, we apply an algorithm based on tensor singular value decomposition to model low tubal rankness, assuming that the log-transformed speckle noise is WGN with sparse outliers. The effectiveness of the methods is evaluated through simulations and phantom studies. Additionally, the tensor-based algorithms’ real-world performance is assessed using DCEUS prostate recordings. Comparative analyses with existing DCEUS denoising literature are conducted, and the algorithms’ capabilities are showcased in the context of prostate cancer classification. The addition of Fisher-tippet distribution did not improve the results of tr-MLSVD in the in vivo case. However, most cancer markers are better distinguishable when using a tensor denoising technique than state-of-the-art approaches.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 7","pages":"2854-2867"},"PeriodicalIF":0.0,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143631137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuhuan Lu;Guanghua Tan;Bin Pu;Pak-Hei Yeung;Hang Wang;Shengli Li;Jagath C. Rajapakse;Kenli Li
{"title":"Optical Flow-Enhanced Mamba U-Net for Cardiac Phase Detection in Ultrasound Videos","authors":"Yuhuan Lu;Guanghua Tan;Bin Pu;Pak-Hei Yeung;Hang Wang;Shengli Li;Jagath C. Rajapakse;Kenli Li","doi":"10.1109/TMI.2025.3550731","DOIUrl":"10.1109/TMI.2025.3550731","url":null,"abstract":"The detection of cardiac phase in ultrasound videos, identifying end-systolic (ES) and end-diastolic (ED) frames, is a critical step in assessing cardiac function, monitoring structural changes, and diagnosing congenital heart disease. Current popular methods use recurrent neural networks to track dependencies over long sequences for cardiac phase detection, but often overlook the short-term motion of cardiac valves that sonographers rely on. In this paper, we propose a novel optical flow-enhanced Mamba U-net framework, designed to utilize both short-term motion and long-term dependencies to detect the cardiac phase in ultrasound videos. We utilize optical flow to capture the short-term motion of cardiac muscles and valves between adjacent frames, enhancing the input video. The Mamba layer is employed to track long-term dependencies across cardiac cycles. We then develop regression branches using the U-Net architecture, which integrates short-term and long-term information while extracting multi-scale features. Using this method, we can generate regression scores for each frame and identify keyframes (i.e., ES and ED frames). Additionally, we design a keyframe weighted loss function to guide the network to focus more on keyframes rather than intermediate period frames. Our method demonstrates superior performance compared to advanced baseline methods, achieving frame mismatches of 1.465 frames for ES and 0.842 frames for ED in the Fetal Echocardiogram dataset, where heart rates are higher and phase changes occur rapidly, and 2.444 frames and 2.072 frames in the publicly available adult Echonet-Dynamic dataset. Its accuracy and robustness in both fetal and adult datasets highlight its potential for clinical application.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 7","pages":"2831-2841"},"PeriodicalIF":0.0,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiwei Shan;Zixin Zhang;Hao Li;Cheng-Tai Hsieh;Yirui Li;Wenhua Wu;Hesheng Wang
{"title":"UW-DNeRF: Deformable Soft Tissue Reconstruction With Uncertainty-Guided Depth Supervision and Local Information Integration","authors":"Jiwei Shan;Zixin Zhang;Hao Li;Cheng-Tai Hsieh;Yirui Li;Wenhua Wu;Hesheng Wang","doi":"10.1109/TMI.2025.3550269","DOIUrl":"10.1109/TMI.2025.3550269","url":null,"abstract":"Reconstructing deformable soft tissues from endoscopic videos is a critical yet challenging task. Leveraging depth priors, deformable implicit neural representations have seen significant advancements in this field. However, depth priors from pre-trained depth estimation models are often coarse, and inaccurate depth supervision can severely impair the performance of these neural networks. Moreover, existing methods overlook local similarities in input sequences, which restricts their effectiveness in capturing local details and tissue deformations. In this paper, we introduce UW-DNeRF, a novel approach utilizing neural radiance fields for high-quality reconstruction of deformable tissues. We propose an uncertainty-guided depth supervision strategy to mitigate the impact of inaccurate depth information. This strategy relaxes hard depth constraints and unlocks the potential of implicit neural representations. In addition, we design a local window-based information sharing scheme. This scheme employs local window and keyframe deformation networks to construct deformations with local awareness and enhances the model’s ability to capture fine details. We demonstrate the superiority of our method over state-of-the-art approaches on synthetic and in vivo endoscopic datasets. Code is available at: <uri>https://github.com/IRMVLab/UW-DNeRF</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 7","pages":"2808-2818"},"PeriodicalIF":0.0,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143599611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}