Oscar Dabrowski;Jean-Luc Falcone;Antoine Klauser;Julien Songeon;Michel Kocher;Bastien Chopard;François Lazeyras;Sébastien Courvoisier
{"title":"SISMIK for Brain MRI: Deep-Learning-Based Motion Estimation and Model-Based Motion Correction in k-Space","authors":"Oscar Dabrowski;Jean-Luc Falcone;Antoine Klauser;Julien Songeon;Michel Kocher;Bastien Chopard;François Lazeyras;Sébastien Courvoisier","doi":"10.1109/TMI.2024.3446450","DOIUrl":"10.1109/TMI.2024.3446450","url":null,"abstract":"MRI, a widespread non-invasive medical imaging modality, is highly sensitive to patient motion. Despite many attempts over the years, motion correction remains a difficult problem and there is no general method applicable to all situations. We propose a retrospective method for motion estimation and correction to tackle the problem of in-plane rigid-body motion, apt for classical 2D Spin-Echo scans of the brain, which are regularly used in clinical practice. Due to the sequential acquisition of k-space, motion artifacts are well localized. The method leverages the power of deep neural networks to estimate motion parameters in k-space and uses a model-based approach to restore degraded images to avoid “hallucinations”. Notable advantages are its ability to estimate motion occurring in high spatial frequencies without the need of a motion-free reference. The proposed method operates on the whole k-space dynamic range and is moderately affected by the lower SNR of higher harmonics. As a proof of concept, we provide models trained using supervised learning on 600k motion simulations based on motion-free scans of 43 different subjects. Generalization performance was tested with simulations as well as in-vivo. Qualitative and quantitative evaluations are presented for motion parameter estimations and image reconstruction. Experimental results show that our approach is able to obtain good generalization performance on simulated data and in-vivo acquisitions. We provide a Python implementation at \u0000<uri>https://gitlab.unige.ch/Oscar.Dabrowski/sismik_mri/</uri>\u0000.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 1","pages":"396-408"},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. M. Amaan Valiuddin;Christiaan G. A. Viviers;Ruud J. G. van Sloun;Peter H. N. de With;Fons van der Sommen
{"title":"Investigating and Improving Latent Density Segmentation Models for Aleatoric Uncertainty Quantification in Medical Imaging","authors":"M. M. Amaan Valiuddin;Christiaan G. A. Viviers;Ruud J. G. van Sloun;Peter H. N. de With;Fons van der Sommen","doi":"10.1109/TMI.2024.3445999","DOIUrl":"10.1109/TMI.2024.3445999","url":null,"abstract":"Data uncertainties, such as sensor noise, occlusions or limitations in the acquisition method can introduce irreducible ambiguities in images, which result in varying, yet plausible, semantic hypotheses. In Machine Learning, this ambiguity is commonly referred to as aleatoric uncertainty. In image segmentation, latent density models can be utilized to address this problem. The most popular approach is the Probabilistic U-Net (PU-Net), which uses latent Normal densities to optimize the conditional data log-likelihood Evidence Lower Bound. In this work, we demonstrate that the PU-Net latent space is severely sparse and heavily under-utilized. To address this, we introduce mutual information maximization and entropy-regularized Sinkhorn Divergence in the latent space to promote homogeneity across all latent dimensions, effectively improving gradient-descent updates and latent space informativeness. Our results show that by applying this on public datasets of various clinical segmentation problems, our proposed methodology receives up to 11% performance gains compared against preceding latent variable models for probabilistic segmentation on the Hungarian-Matched Intersection over Union. The results indicate that encouraging a homogeneous latent space significantly improves latent density modeling for medical image segmentation.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 1","pages":"384-395"},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"S²Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR","authors":"Jialun Pei;Diandian Guo;Jingyang Zhang;Manxi Lin;Yueming Jin;Pheng-Ann Heng","doi":"10.1109/TMI.2024.3444279","DOIUrl":"10.1109/TMI.2024.3444279","url":null,"abstract":"Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR). However, previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection. This pipeline may potentially compromise the flexibility of learning multimodal representations, consequently constraining the overall effectiveness. In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR, aimed to complementally leverage multi-view 2D scenes and 3D point clouds for SGG in an end-to-end manner. Concretely, our model embraces a View-Sync Transfusion scheme to encourage multi-view visual information interaction. Concurrently, a Geometry-Visual Cohesion operation is designed to integrate the synergic 2D semantic features into 3D point cloud features. Moreover, based on the augmented feature, we propose a novel relation-sensitive transformer decoder that embeds dynamic entity-pair queries and relational trait priors, which enables the direct prediction of entity-pair relations for graph generation without intermediate steps. Extensive experiments have validated the superior SGG performance and lower computational cost of S2Former-OR on 4D-OR benchmark, compared with current OR-SGG methods, e.g., 3 percentage points Precision increase and 24.2M reduction in model parameters. We further compared our method with generic single-stage SGG methods with broader metrics for a comprehensive evaluation, with consistently better performance achieved. Our source code can be made available at: \u0000<uri>https://github.com/PJLallen/S2Former-OR</uri>\u0000.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 1","pages":"361-372"},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cagan Alkan;Morteza Mardani;Congyu Liao;Zhitao Li;Shreyas S. Vasanawala;John M. Pauly
{"title":"AutoSamp: Autoencoding k-Space Sampling via Variational Information Maximization for 3D MRI","authors":"Cagan Alkan;Morteza Mardani;Congyu Liao;Zhitao Li;Shreyas S. Vasanawala;John M. Pauly","doi":"10.1109/TMI.2024.3443292","DOIUrl":"10.1109/TMI.2024.3443292","url":null,"abstract":"Accelerated MRI protocols routinely involve a predefined sampling pattern that undersamples the k-space. Finding an optimal pattern can enhance the reconstruction quality, however this optimization is a challenging task. To address this challenge, we introduce a novel deep learning framework, AutoSamp, based on variational information maximization that enables joint optimization of sampling pattern and reconstruction of MRI scans. We represent the encoder as a non-uniform Fast Fourier Transform that allows continuous optimization of k-space sample locations on a non-Cartesian plane, and the decoder as a deep reconstruction network. Experiments on public 3D acquired MRI datasets show improved reconstruction quality of the proposed AutoSamp method over the prevailing variable density and variable density Poisson disc sampling for both compressed sensing and deep learning reconstructions. We demonstrate that our data-driven sampling optimization method achieves 4.4dB, 2.0dB, 0.75dB, 0.7dB PSNR improvements over reconstruction with Poisson Disc masks for acceleration factors of R =5, 10, 15, 25, respectively. Prospectively accelerated acquisitions with 3D FSE sequences using our optimized sampling patterns exhibit improved image quality and sharpness. Furthermore, we analyze the characteristics of the learned sampling patterns with respect to changes in acceleration factor, measurement noise, underlying anatomy, and coil sensitivities. We show that all these factors contribute to the optimization result by affecting the sampling density, k-space coverage and point spread functions of the learned sampling patterns.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 1","pages":"270-283"},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengting Luo, Nan Zhou, Tao Wang, Linchao He, Wang Wang, Hu Chen, Peixi Liao, Yi Zhang
{"title":"Bi-Constraints Diffusion: A Conditional Diffusion Model with Degradation Guidance for Metal Artifact Reduction.","authors":"Mengting Luo, Nan Zhou, Tao Wang, Linchao He, Wang Wang, Hu Chen, Peixi Liao, Yi Zhang","doi":"10.1109/TMI.2024.3442950","DOIUrl":"10.1109/TMI.2024.3442950","url":null,"abstract":"<p><p>In recent years, score-based diffusion models have emerged as effective tools for estimating score functions from empirical data distributions, particularly in integrating implicit priors with inverse problems like CT reconstruction. However, score-based diffusion models are rarely explored in challenging tasks such as metal artifact reduction (MAR). In this paper, we introduce the BiConstraints Diffusion Model for Metal Artifact Reduction (BCDMAR), an innovative approach that enhances iterative reconstruction with a conditional diffusion model for MAR. This method employs a metal artifact degradation operator in place of the traditional metal-excluded projection operator in the data-fidelity term, thereby preserving structure details around metal regions. However, scorebased diffusion models tend to be susceptible to grayscale shifts and unreliable structures, making it challenging to reach an optimal solution. To address this, we utilize a precorrected image as a prior constraint, guiding the generation of the score-based diffusion model. By iteratively applying the score-based diffusion model and the data-fidelity step in each sampling iteration, BCDMAR effectively maintains reliable tissue representation around metal regions and produces highly consistent structures in non-metal regions. Through extensive experiments focused on metal artifact reduction tasks, BCDMAR demonstrates superior performance over other state-of-the-art unsupervised and supervised methods, both quantitatively and in terms of visual results.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Domain-interactive Contrastive Learning and Prototype-guided Self-training for Cross-domain Polyp Segmentation.","authors":"Ziru Lu, Yizhe Zhang, Yi Zhou, Ye Wu, Tao Zhou","doi":"10.1109/TMI.2024.3443262","DOIUrl":"https://doi.org/10.1109/TMI.2024.3443262","url":null,"abstract":"<p><p>Accurate polyp segmentation plays a critical role from colonoscopy images in the diagnosis and treatment of colorectal cancer. While deep learning-based polyp segmentation models have made significant progress, they often suffer from performance degradation when applied to unseen target domain datasets collected from different imaging devices. To address this challenge, unsupervised domain adaptation (UDA) methods have gained attention by leveraging labeled source data and unlabeled target data to reduce the domain gap. However, existing UDA methods primarily focus on capturing class-wise representations, neglecting domain-wise representations. Additionally, uncertainty in pseudo labels could hinder the segmentation performance. To tackle these issues, we propose a novel Domain-interactive Contrastive Learning and Prototype-guided Self-training (DCL-PS) framework for cross-domain polyp segmentation. Specifically, domaininteractive contrastive learning (DCL) with a domain-mixed prototype updating strategy is proposed to discriminate class-wise feature representations across domains. Then, to enhance the feature extraction ability of the encoder, we present a contrastive learning-based cross-consistency training (CL-CCT) strategy, which is imposed on both the prototypes obtained by the outputs of the main decoder and perturbed auxiliary outputs. Furthermore, we propose a prototype-guided self-training (PS) strategy, which dynamically assigns a weight for each pixel during selftraining, filtering out unreliable pixels and improving the quality of pseudo-labels. Experimental results demonstrate the superiority of DCL-PS in improving polyp segmentation performance in the target domain. The code will be released at https://github.com/taozh2017/DCLPS.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141984206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prompt-Driven Latent Domain Generalization for Medical Image Classification","authors":"Siyuan Yan;Zhen Yu;Chi Liu;Lie Ju;Dwarikanath Mahapatra;Brigid Betz-Stablein;Victoria Mar;Monika Janda;Peter Soyer;Zongyuan Ge","doi":"10.1109/TMI.2024.3443119","DOIUrl":"10.1109/TMI.2024.3443119","url":null,"abstract":"Deep learning models for medical image analysis easily suffer from distribution shifts caused by dataset artifact bias, camera variations, differences in the imaging station, etc., leading to unreliable diagnoses in real-world clinical settings. Domain generalization (DG) methods, which aim to train models on multiple domains to perform well on unseen domains, offer a promising direction to solve the problem. However, existing DG methods assume domain labels of each image are available and accurate, which is typically feasible for only a limited number of medical datasets. To address these challenges, we propose a unified DG framework for medical image classification without relying on domain labels, called Prompt-driven Latent Domain Generalization (PLDG). PLDG consists of unsupervised domain discovery and prompt learning. This framework first discovers pseudo domain labels by clustering the bias-associated style features, then leverages collaborative domain prompts to guide a Vision Transformer to learn knowledge from discovered diverse domains. To facilitate cross-domain knowledge learning between different prompts, we introduce a domain prompt generator that enables knowledge sharing between domain prompts and a shared prompt. A domain mixup strategy is additionally employed for more flexible decision margins and mitigates the risk of incorrect domain assignments. Extensive experiments on three medical image classification tasks and one debiasing task demonstrate that our method can achieve comparable or even superior performance than conventional DG algorithms without relying on domain labels. Our code is publicly available at \u0000<uri>https://github.com/SiyuanYan1/PLDG/tree/main</uri>\u0000.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 1","pages":"348-360"},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141977493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Benchmark: Clinical Uncertainty and Severity Aware Labeled Chest X-Ray Images With Multi-Relationship Graph Learning","authors":"Mengliang Zhang;Xinyue Hu;Lin Gu;Liangchen Liu;Kazuma Kobayashi;Tatsuya Harada;Yan Yan;Ronald M. Summers;Yingying Zhu","doi":"10.1109/TMI.2024.3441494","DOIUrl":"10.1109/TMI.2024.3441494","url":null,"abstract":"Chest radiography, commonly known as CXR, is frequently utilized in clinical settings to detect cardiopulmonary conditions. However, even seasoned radiologists might offer different evaluations regarding the seriousness and uncertainty associated with observed abnormalities. Previous research has attempted to utilize clinical notes to extract abnormal labels for training deep-learning models in CXR image diagnosis. However, these methods often neglected the varying degrees of severity and uncertainty linked to different labels. In our study, we initially assembled a comprehensive new dataset of CXR images based on clinical textual data, which incorporated radiologists’ assessments of uncertainty and severity. Using this dataset, we introduced a multi-relationship graph learning framework that leverages spatial and semantic relationships while addressing expert uncertainty through a dedicated loss function. Our research showcases a notable enhancement in CXR image diagnosis and the interpretability of the diagnostic model, surpassing existing state-of-the-art methodologies. The dataset address of disease severity and uncertainty we extracted is: \u0000<uri>https://physionet.org/content/cad-chest/1.0/</uri>\u0000.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 1","pages":"338-347"},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RemixFormer++: A Multi-Modal Transformer Model for Precision Skin Tumor Differential Diagnosis With Memory-Efficient Attention","authors":"Jing Xu;Kai Huang;Lianzhen Zhong;Yuan Gao;Kai Sun;Wei Liu;Yanjie Zhou;Wenchao Guo;Yuan Guo;Yuanqiang Zou;Yuping Duan;Le Lu;Yu Wang;Xiang Chen;Shuang Zhao","doi":"10.1109/TMI.2024.3441012","DOIUrl":"10.1109/TMI.2024.3441012","url":null,"abstract":"Diagnosing malignant skin tumors accurately at an early stage can be challenging due to ambiguous and even confusing visual characteristics displayed by various categories of skin tumors. To improve diagnosis precision, all available clinical data from multiple sources, particularly clinical images, dermoscopy images, and medical history, could be considered. Aligning with clinical practice, we propose a novel Transformer model, named RemixFormer++ that consists of a clinical image branch, a dermoscopy image branch, and a metadata branch. Given the unique characteristics inherent in clinical and dermoscopy images, specialized attention strategies are adopted for each type. Clinical images are processed through a top-down architecture, capturing both localized lesion details and global contextual information. Conversely, dermoscopy images undergo a bottom-up processing with two-level hierarchical encoders, designed to pinpoint fine-grained structural and textural features. A dedicated metadata branch seamlessly integrates non-visual information by encoding relevant patient data. Fusing features from three branches substantially boosts disease classification accuracy. RemixFormer++ demonstrates exceptional performance on four single-modality datasets (PAD-UFES-20, ISIC 2017/2018/2019). Compared with the previous best method using a public multi-modal Derm7pt dataset, we achieved an absolute 5.3% increase in averaged F1 and 1.2% in accuracy for the classification of five skin tumors. Furthermore, using a large-scale in-house dataset of 10,351 patients with the twelve most common skin tumors, our method obtained an overall classification accuracy of 92.6%. These promising results, on par or better with the performance of 191 dermatologists through a comprehensive reader study, evidently imply the potential clinical usability of our method.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 1","pages":"320-337"},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruifeng Chen;Zhongliang Zhang;Guotao Quan;Yanfeng Du;Yang Chen;Yinsheng Li
{"title":"PRECISION: A Physics-Constrained and Noise-Controlled Diffusion Model for Photon Counting Computed Tomography","authors":"Ruifeng Chen;Zhongliang Zhang;Guotao Quan;Yanfeng Du;Yang Chen;Yinsheng Li","doi":"10.1109/TMI.2024.3440651","DOIUrl":"10.1109/TMI.2024.3440651","url":null,"abstract":"Recently, the use of photon counting detectors in computed tomography (PCCT) has attracted extensive attention. It is highly desired to improve the quality of material basis image and the quantitative accuracy of elemental composition, particularly when PCCT data is acquired at lower radiation dose levels. In this work, we develop a \u0000<underline>p</u>\u0000hysics-const\u0000<underline>r</u>\u0000ained and nois\u0000<underline>e</u>\u0000-\u0000<underline>c</u>\u0000ontrolled d\u0000<underline>i</u>\u0000ffu\u0000<underline>sion</u>\u0000 model, PRECISION in short, to address the degraded quality of material basis images and inaccurate quantification of elemental composition mainly caused by imperfect noise model and/or hand-crafted regularization of material basis images, such as local smoothness and/or sparsity, leveraged in the existing direct material basis image reconstruction approaches. In stark contrast, PRECISION learns distribution-level regularization to describe the feature of ideal material basis images via training a noise-controlled spatial-spectral diffusion model. The optimal material basis images of each individual subject are sampled from this learned distribution under the constraint of the physical model of a given PCCT and the measured data obtained from the subject. PRECISION exhibits the potential to improve the quality of material basis images and the quantitative accuracy of elemental composition for PCCT.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"43 10","pages":"3476-3489"},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141908635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}