{"title":"Lung Cancer Screening Classification by Sequential Multi-Instance Learning (SMILE) Framework With Multiple CT Scans","authors":"Wangyuan Zhao;Yuanyuan Fu;Yujia Shen;Jingchen Ma;Lu Zhao;Xiaolong Fu;Puming Zhang;Jun Zhao","doi":"10.1109/TMI.2025.3559143","DOIUrl":"10.1109/TMI.2025.3559143","url":null,"abstract":"Lung cancer screening with computed tomography (CT) scans can effectively improve the survival rate through the early detection of lung cancer, which typically identified in the form of pulmonary nodules. Multiple sequential CT images are helpful to determine nodule malignancy and play a significant role to detect lung cancers. It is crucial to develop effective lung cancer classification algorithms to achieve accurate results from multiple images without nodule location annotations, which can free radiologists from the burden of labeling nodule locations before predicting malignancy. In this study, we proposed the sequential multi-instance learning (SMILE) framework to predict high-risk lung cancer patients with multiple CT scans. SMILE included two steps. The first step was nodule instance generation. We employed the nodule detection algorithm with image category transformation to identify nodule instance locations within the entire lung images. The second step was nodule malignancy prediction. Models were supervised by patient-level annotations, without the exact locations of nodules. We embedded multi-instance learning with temporal feature extraction into a fusion framework, which effectively promoted the classification performance. SMILE was evaluated by five-fold cross-validation on a 925-patient dataset (182 malignant, 743 benign). Every patient had three CT scans, of which the interval period was about one year. Experimental results showed the potential of SMILE to free radiologists from labeling nodule locations. The source code will be available at <uri>https://github.com/wyzhao27/SMILE</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 8","pages":"3151-3161"},"PeriodicalIF":0.0,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143813522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Representation of High-Frequency Components for Medical Visual Foundation Models","authors":"Yuetan Chu;Yilan Zhang;Zhongyi Han;Changchun Yang;Longxi Zhou;Gongning Luo;Chao Huang;Xin Gao","doi":"10.1109/TMI.2025.3559402","DOIUrl":"10.1109/TMI.2025.3559402","url":null,"abstract":"Foundation models have attracted significant attention for their impressive generalizability across diverse downstream tasks. However, they are demonstrated to exhibit great limitations in representing high-frequency components and fine-grained details. In many medical imaging tasks, precise representation of such information is crucial due to the inherently intricate anatomical structures, sub-visual features, and complex boundaries involved. Consequently, the limited representation of prevalent foundation models can result in considerable performance degradation or even failure in these tasks. To address these challenges, we propose a novel pretraining strategy for both 2D images and 3D volumes, named Frequency-advanced Representation Autoencoder (Frepa). Through high-frequency masking and low-frequency perturbation combined with embedding consistency learning, Frepa encourages the encoder to effectively represent and preserve high-frequency components in the image embeddings. Additionally, we introduce an innovative histogram-equalized image masking strategy, extending the Masked Autoencoder approach beyond ViT to other architectures such as Swin-Transformer and convolutional networks. We develop Frepa across nine medical modalities and validate it on 32 downstream tasks for both 2D images and 3D volumes. Without fine-tuning, Frepa can outperform other self-supervised pretraining methods and, in some cases, even surpasses task-specific foundation models. This improvement is particularly significant for tasks involving fine-grained details, such as achieving up to a +15% increase in dice score for retina vessel segmentation and a +8% increase in IoU for lung tumor detection. Further experiment quantitatively reveals that Frepa enables superior high-frequency representations and preservation in the embeddings, underscoring its potential for developing more generalized and universal medical image foundation models.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 8","pages":"3196-3209"},"PeriodicalIF":0.0,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143813518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiparametric Ultrasound Breast Tumors Diagnosis Within BI-RADS Category 4 via Feature Disentanglement and Cross-Fusion","authors":"Zhikai Ruan;Canxu Song;Pengfei Xu;Chaoyu Wang;Jing Zhao;Meng Chen;Suoni Li;Qiang Su;Xiaozhen Zhuo;Yue Wu;Mingxi Wan;Diya Wang","doi":"10.1109/TMI.2025.3558786","DOIUrl":"10.1109/TMI.2025.3558786","url":null,"abstract":"BI-RADS category 4 is the diagnostic threshold between benign and malignant breast tumors and is critical in determining clinical breast cancer treatment options. However, breast tumors within BI-RADS category 4 tend to show subtle or contradictory differences between benign and malignant on B-mode images, leading to uncertainty in clinical diagnosis. Recently, many deep learning studies have realized the value of multimodal and multiparametric ultrasound in the diagnosis of breast tumors. However, due to the heterogeneity of data, how to effectively represent and fuse common and specific features from multiple sources of information is an open question, which is often overlooked by existing computer-aided diagnosis methods. To address these problems, we propose a novel framework that integrates multiparametric ultrasound information (B-mode images, Nakagami parametric images, and semantic attributes) to assist the diagnosis of BI-RADS 4 breast tumors. The framework extracts and disentangles common and specific features from B-mode and Nakagami parametric images based on a dual-branch Transformer-CNN encoder. Meanwhile, we propose a novel feature disentanglement loss to further ensure the complementarity and consistency of multiparametric features. In addition, we construct a multiparameter cross-fusion module to integrate the high-level features extracted from multiparametric images and semantic attributes. Extensive experiments on the multicenter multiparametric dataset demonstrated the superiority of the proposed framework over the state-of-the-art methods in the diagnosis for BI-RADS 4 breast tumors. The code is available at <uri>https://github.com/rzk-code/MUBTD</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 7","pages":"3064-3075"},"PeriodicalIF":0.0,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143805819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Space-Time Encoded Modulation for High-Fidelity Diffuse Optical Imaging","authors":"Ben Wiesel;Shlomi Arnon","doi":"10.1109/TMI.2025.3558865","DOIUrl":"10.1109/TMI.2025.3558865","url":null,"abstract":"Diffuse optical imaging (DOI) offers valuable insights into scattering mediums, but the quest for high-resolution imaging often requires dense sampling strategies, leading to higher imaging errors and lengthy acquisition times. This work introduces Space-Time Encoded Modulation (STEM), a novel light modulation scheme enabling low-noise, high-resolution imaging with single-pixel detectors. In STEM, a laser illuminates the sample, and the transmitted light is detected using a single pixel detector. The detected image is partitioned into a two-dimensional array of sub-images, each encoded with a unique quasi-orthogonal code. These coded sub-images represent light transmission at specific locations along the sample boundary. A single-pixel detector then measures their combined transmission. By virtue of their quasi-orthogonality, the relative strength of each sub-image can be measured, enabling image formation. In this paper, we present a comprehensive mathematical description and experimental validation of the STEM method. Compared to traditional raster scanning, STEM significantly enhances imaging quality, reducing imaging errors by up to 60% and yielding a 3.5-fold increase in reconstruction contrast.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 9","pages":"3717-3726"},"PeriodicalIF":0.0,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143805820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Revealing Cortical Spreading Pathway of Neuropathological Events by Neural Optimal Mass Transport","authors":"Tingting Dan;Yanquan Huang;Yang Yang;Guorong Wu","doi":"10.1109/TMI.2025.3558691","DOIUrl":"10.1109/TMI.2025.3558691","url":null,"abstract":"Positron Emission Tomography (PET) is essential for understanding the pathophysiological mechanisms underlying neurodegenerative diseases like Alzheimer’s disease (AD). However, existing approaches primarily focus on stereotypical patterns of pathology burden, lacking the ability to elucidate the underlying propagation mechanisms by which pathologies spread throughout the brain over time. Given that many neurodegenerative diseases exhibit prion-like pathology spread, it is essential to uncover the spot-to-spot flow field between consecutive PET snapshots. To address this, we reformulate the problem of identifying latent cortical propagation pathways of neuropathological burden within the well-established framework of optimal mass transport (OMT). In this formulation, the dynamic spreading of pathology across longitudinal PET scans is inherently constrained by the geometry of the brain cortex. To solve this problem, we introduce a variational framework that characterizes the dynamical system of pathology propagation in the brain, ultimately reducing to a Wasserstein geodesic between two density distributions of pathology accumulation. Furthermore, we hypothesize that a well-characterized mechanism of pathology propagation will enable the prediction of future pathology accumulation at the individual level, paving the way for personalized disease progression modeling. Building on the principles of physics-informed deep models, we derive the governing equation of the underlying OMT model and introduce an explainable, generative adversarial network-inspired framework. Our approach (1) parameterizes population-level OMT dynamics through a flow adjuster and (2) predicts the spreading flow in unseen subjects using a trained flow driver. We validate the accuracy of our model on publicly available datasets, demonstrating its effectiveness in forecasting future pathology accumulation. Since our deep model adheres to the second law of thermodynamics, we further explore the propagation dynamics of tau aggregates throughout the progression of AD. In contrast to traditional methods, our physics-informed approach enhances both accuracy and interpretability, demonstrating its potential to reveal novel neurobiological mechanisms driving disease progression.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 7","pages":"3100-3109"},"PeriodicalIF":0.0,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143797729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-Frequency Modulated Transformer for Multi-Contrast MRI Super-Resolution","authors":"Juncheng Li;Hanhui Yang;Qiaosi Yi;Minhua Lu;Jun Shi;Tieyong Zeng","doi":"10.1109/TMI.2025.3558164","DOIUrl":"10.1109/TMI.2025.3558164","url":null,"abstract":"Accelerating the MRI acquisition process is always a key issue in modern medical practice, and great efforts have been devoted to fast MR imaging. Among them, multi-contrast MR imaging is a promising and effective solution that utilizes and combines information from different contrasts. However, existing methods may ignore the importance of the high-frequency priors among different contrasts. Moreover, they may lack an efficient method to fully utilize the information from the reference contrast. In this paper, we propose a lightweight and accurate High-frequency Modulated Transformer (HFMT) for multi-contrast MRI super-resolution. The key ideas of HFMT are high-frequency prior enhancement and its fusion with global features. Specifically, we employ an enhancement module to enhance and amplify the high-frequency priors in the reference and target modalities. In addition, we utilize the Rectangle Window Transformer Block (RWTB) to capture global information in the target contrast. Meanwhile, we propose a novel cross-attention mechanism to fuse the high-frequency enhanced features with the global features sequentially, which assists the network in recovering clear texture details from the low-resolution inputs. Extensive experiments show that our proposed method can reconstruct high-quality images with fewer parameters and faster inference time.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 7","pages":"3089-3099"},"PeriodicalIF":0.0,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10949290","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143782413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Corrections to “Multi-Label Generalized Zero Shot Chest X-Ray Classification By Combining Image-Text Information With Feature Disentanglement”","authors":"Dwarikanath Mahapatra;Antonio Jimeno Yepes;Behzad Bozorgtabar;Sudipta Roy;Zongyuan Ge;Mauricio Reyes","doi":"10.1109/TMI.2025.3549666","DOIUrl":"10.1109/TMI.2025.3549666","url":null,"abstract":"Presents corrections to the paper, (Corrections to “Multi-Label Generalized Zero Shot Chest X-Ray Classification By Combining Image-Text Information With Feature Disentanglement”).","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 4","pages":"1984-1985"},"PeriodicalIF":0.0,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10948537","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143775404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correntropy-Based Improper Likelihood Model for Robust Electrophysiological Source Imaging","authors":"Yuanhao Li;Badong Chen;Zhongxu Hu;Keita Suzuki;Wenjun Bai;Yasuharu Koike;Okito Yamashita","doi":"10.1109/TMI.2025.3557528","DOIUrl":"10.1109/TMI.2025.3557528","url":null,"abstract":"Bayesian learning provides a unified skeleton to solve the electrophysiological source imaging task. From this perspective, existing source imaging algorithms utilize the Gaussian assumption for the observation noise to build the likelihood function for Bayesian inference. However, the electromagnetic measurements of brain activity are usually affected by miscellaneous artifacts, leading to a potentially non-Gaussian distribution for the observation noise. Hence the conventional Gaussian likelihood model is a suboptimal choice for the real-world source imaging task. In this study, we aim to solve this problem by proposing a new likelihood model which is robust with respect to non-Gaussian noises. Motivated by the robust maximum correntropy criterion, we propose a new improper distribution model concerning the noise assumption. This new noise distribution is leveraged to structure a robust likelihood function and integrated with hierarchical prior distributions to estimate source activities by variational inference. In particular, the score matching is adopted to determine the hyperparameters for the improper likelihood model. A comprehensive performance evaluation is performed to compare the proposed noise assumption to the conventional Gaussian model. Simulation results show that, the proposed method can realize more precise source reconstruction by designing known ground-truth. The real-world dataset also demonstrates the superiority of our new method with the visual perception task. This study provides a new backbone for Bayesian source imaging, which would facilitate its application using real-world noisy brain signal.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 7","pages":"3076-3088"},"PeriodicalIF":0.0,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143775554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Better Cephalometric Landmark Detection With Diffusion Data Generation","authors":"Dongqian Guo;Wencheng Han;Pang Lyu;Yuxi Zhou;Jianbing Shen","doi":"10.1109/TMI.2025.3557430","DOIUrl":"10.1109/TMI.2025.3557430","url":null,"abstract":"Cephalometric landmark detection is essential for orthodontic diagnostics and treatment planning. Nevertheless, the scarcity of samples in data collection and the extensive effort required for manual annotation have significantly impeded the availability of diverse datasets. This limitation has restricted the effectiveness of deep learning-based detection methods, particularly those based on large-scale vision models. To address these challenges, we have developed an innovative data generation method capable of producing diverse cephalometric X-ray images along with corresponding annotations without human intervention. To achieve this, our approach initiates by constructing new cephalometric landmark annotations using anatomical priors. Then, we employ a diffusion-based generator to create realistic X-ray images that correspond closely with these annotations. To achieve precise control in producing samples with different attributes, we introduce a novel prompt cephalometric X-ray image dataset. This dataset includes real cephalometric X-ray images and detailed medical text prompts describing the images. By leveraging these detailed prompts, our method improves the generation process to control different styles and attributes. Facilitated by the large, diverse generated data, we introduce large-scale vision detection models into the cephalometric landmark detection task to improve accuracy. Experimental results demonstrate that training with the generated data substantially enhances the performance. Compared to methods without using the generated data, our approach improves the Success Detection Rate (SDR) by 6.5%, attaining a notable 82.2%. All code and data are available at: <uri>https://um-lab.github.io/cepha-generation/</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 7","pages":"2784-2794"},"PeriodicalIF":0.0,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143775359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
You Wang;Zekun Li;Lei Qi;Qian Yu;Yinghuan Shi;Yang Gao
{"title":"Balancing Multi-Target Semi-Supervised Medical Image Segmentation With Collaborative Generalist and Specialists","authors":"You Wang;Zekun Li;Lei Qi;Qian Yu;Yinghuan Shi;Yang Gao","doi":"10.1109/TMI.2025.3557537","DOIUrl":"10.1109/TMI.2025.3557537","url":null,"abstract":"Despite the promising performance achieved by current semi-supervised models in segmenting individual medical targets, many of these models suffer a notable decrease in performance when tasked with the simultaneous segmentation of multiple targets. A vital factor could be attributed to the imbalanced scales among different targets: during simultaneously segmenting multiple targets, large targets dominate the loss, leading to small targets being misclassified as larger ones. To this end, we propose a novel method, which consists of a Collaborative Generalist and several Specialists, termed CGS. It is centered around the idea of employing a specialist for each target class, thus avoiding the dominance of larger targets. The generalist performs conventional multi-target segmentation, while each specialist is dedicated to distinguishing a specific target class from the remaining target classes and the background. Based on a theoretical insight, we demonstrate that CGS can achieve a more balanced training. Moreover, we develop cross-consistency losses to foster collaborative learning between the generalist and the specialists. Lastly, regarding their intrinsic relation that the target class of any specialized head should belong to the remaining classes of the other heads, we introduce an inter-head error detection module to further enhance the quality of pseudo-labels. Experimental results on three popular benchmarks showcase its superior performance compared to state-of-the-art methods. Our code is available at <monospace><uri>https://github.com/wangyou0804/CGS</uri></monospace>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 7","pages":"3025-3037"},"PeriodicalIF":0.0,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143775360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}