{"title":"SFPL: Sample-specific fine-grained prototype learning for imbalanced medical image classification","authors":"","doi":"10.1016/j.media.2024.103281","DOIUrl":"10.1016/j.media.2024.103281","url":null,"abstract":"<div><p>Imbalanced classification is a common and difficult task in many medical image analysis applications. However, most existing approaches focus on balancing feature distribution and classifier weights between classes, while ignoring the inner-class heterogeneity and the individuality of each sample. In this paper, we proposed a sample-specific fine-grained prototype learning (SFPL) method to learn the fine-grained representation of the majority class and learn a cosine classifier specifically for each sample such that the classification model is highly tuned to the individual’s characteristic. SFPL first builds multiple prototypes to represent the majority class, and then updates the prototypes through a mixture weighting strategy. Moreover, we proposed a uniform loss based on set representations to make the fine-grained prototypes distribute uniformly. To establish associations between fine-grained prototypes and cosine classifier, we propose a selective attention aggregation module to select the effective fine-grained prototypes for final classification. Extensive experiments on three different tasks demonstrate that SFPL outperforms the state-of-the-art (SOTA) methods. Importantly, as the imbalance ratio increases from 10 to 100, the improvement of SFPL over SOTA methods increases from 2.2% to 2.4%; as the training data decreases from 800 to 100, the improvement of SFPL over SOTA methods increases from 2.2% to 3.8%.</p></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":null,"pages":null},"PeriodicalIF":10.7,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1361841524002068/pdfft?md5=5f8a1457ddfe3a57cc601f6bb9a7dca6&pid=1-s2.0-S1361841524002068-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141841709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EfficientQ: An efficient and accurate post-training neural network quantization method for medical image segmentation","authors":"","doi":"10.1016/j.media.2024.103277","DOIUrl":"10.1016/j.media.2024.103277","url":null,"abstract":"<div><p>Model quantization is a promising technique that can simultaneously compress and accelerate a deep neural network by limiting its computation bit-width, which plays a crucial role in the fast-growing AI industry. Despite model quantization’s success in producing well-performing low-bit models, the quantization process itself can still be expensive, which may involve a long fine-tuning stage on a large, well-annotated training set. To make the quantization process more efficient in terms of both time and data requirements, this paper proposes a fast and accurate post-training quantization method, namely EfficientQ. We develop this new method with a layer-wise optimization strategy and leverage the powerful alternating direction method of multipliers (ADMM) algorithm to ensure fast convergence. Furthermore, a weight regularization scheme is incorporated to provide more guidance for the optimization of the discrete weights, and a self-adaptive attention mechanism is proposed to combat the class imbalance problem. Extensive comparison and ablation experiments are conducted on two publicly available medical image segmentation datasets, i.e., LiTS and BraTS2020, and the results demonstrate the superiority of the proposed method over various existing post-training quantization methods in terms of both accuracy and optimization speed. Remarkably, with EfficientQ, the quantization of a practical 3D UNet only requires less than 5 min on a single GPU and one data sample. The source code is available at <span><span>https://github.com/rongzhao-zhang/EfficientQ</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":null,"pages":null},"PeriodicalIF":10.7,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141843629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers","authors":"","doi":"10.1016/j.media.2024.103280","DOIUrl":"10.1016/j.media.2024.103280","url":null,"abstract":"<div><p>Medical image segmentation is crucial for healthcare, yet convolution-based methods like U-Net face limitations in modeling long-range dependencies. To address this, Transformers designed for sequence-to-sequence predictions have been integrated into medical image segmentation. However, a comprehensive understanding of Transformers’ self-attention in U-Net components is lacking. TransUNet, first introduced in 2021, is widely recognized as one of the first models to integrate Transformer into medical image analysis. In this study, we present the versatile framework of TransUNet that encapsulates Transformers’ self-attention into two key modules: (1) a Transformer encoder tokenizing image patches from a convolution neural network (CNN) feature map, facilitating global context extraction, and (2) a Transformer decoder refining candidate regions through cross-attention between proposals and U-Net features. These modules can be flexibly inserted into the U-Net backbone, resulting in three configurations: Encoder-only, Decoder-only, and Encoder+Decoder. TransUNet provides a library encompassing both 2D and 3D implementations, enabling users to easily tailor the chosen architecture. Our findings highlight the encoder’s efficacy in modeling interactions among multiple abdominal organs and the decoder’s strength in handling small targets like tumors. It excels in diverse medical applications, such as multi-organ segmentation, pancreatic tumor segmentation, and hepatic vessel segmentation. Notably, our TransUNet achieves a significant average Dice improvement of 1.06% and 4.30% for multi-organ segmentation and pancreatic tumor segmentation, respectively, when compared to the highly competitive nn-UNet, and surpasses the top-1 solution in the BrasTS2021 challenge. 2D/3D Code and models are available at <span><span>https://github.com/Beckschen/TransUNet</span><svg><path></path></svg></span> and <span><span>https://github.com/Beckschen/TransUNet-3D</span><svg><path></path></svg></span>, respectively.</p></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":null,"pages":null},"PeriodicalIF":10.7,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1361841524002056/pdfft?md5=06af9f939f40dc38d4e6d699170f92e0&pid=1-s2.0-S1361841524002056-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141840620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interpretable medical image Visual Question Answering via multi-modal relationship graph learning","authors":"","doi":"10.1016/j.media.2024.103279","DOIUrl":"10.1016/j.media.2024.103279","url":null,"abstract":"<div><p>Medical Visual Question Answering (VQA) is an important task in medical multi-modal Large Language Models (LLMs), aiming to answer clinically relevant questions regarding input medical images. This technique has the potential to improve the efficiency of medical professionals while relieving the burden on the public health system, particularly in resource-poor countries. However, existing medical VQA datasets are small and only contain simple questions (equivalent to classification tasks), which lack semantic reasoning and clinical knowledge. Our previous work proposed a clinical knowledge-driven image difference VQA benchmark using a rule-based approach (Hu et al., 2023). However, given the same breadth of information coverage, the rule-based approach shows an 85% error rate on extracted labels. We trained an LLM method to extract labels with 62% increased accuracy. We also comprehensively evaluated our labels with 2 clinical experts on 100 samples to help us fine-tune the LLM. Based on the trained LLM model, we proposed a large-scale medical VQA dataset, Medical-CXR-VQA, using LLMs focused on chest X-ray images. The questions involved detailed information, such as abnormalities, locations, levels, and types. Based on this dataset, we proposed a novel VQA method by constructing three different relationship graphs: spatial relationships, semantic relationships, and implicit relationship graphs on the image regions, questions, and semantic labels. We leveraged graph attention to learn the logical reasoning paths for different questions. These learned graph VQA reasoning paths can be further used for LLM prompt engineering and chain-of-thought, which are crucial for further fine-tuning and training multi-modal large language models. Moreover, we demonstrate that our approach has the qualities of evidence and faithfulness, which are crucial in the clinical field. The code and the dataset is available at <span><span>https://github.com/Holipori/Medical-CXR-VQA</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":null,"pages":null},"PeriodicalIF":10.7,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141838474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PRSCS-Net: Progressive 3D/2D rigid Registration network with the guidance of Single-view Cycle Synthesis","authors":"","doi":"10.1016/j.media.2024.103283","DOIUrl":"10.1016/j.media.2024.103283","url":null,"abstract":"<div><p>The 3D/2D registration for 3D pre-operative images (computed tomography, CT) and 2D intra-operative images (X-ray) plays an important role in image-guided spine surgeries. Conventional iterative-based approaches suffer from time-consuming processes. Existing learning-based approaches require high computational costs and face poor performance on large misalignment because of projection-induced losses or ill-posed reconstruction. In this paper, we propose a Progressive 3D/2D rigid Registration network with the guidance of Single-view Cycle Synthesis, named PRSCS-Net. Specifically, we first introduce the differentiable backward/forward projection operator into the single-view cycle synthesis network, which reconstructs corresponding 3D geometry features from two 2D intra-operative view images (one from the input, and the other from the synthesis). In this way, the problem of limited views during reconstruction can be solved. Subsequently, we employ a self-reconstruction path to extract latent representation from pre-operative 3D CT images. The following pose estimation process will be performed in the 3D geometry feature space, which can solve the dimensional gap, greatly reduce the computational complexity, and ensure that the features extracted from pre-operative and intra-operative images are as relevant as possible to pose estimation. Furthermore, to enhance the ability of our model for handling large misalignment, we develop a progressive registration path, including two sub-registration networks, aiming to estimate the pose parameters via two-step warping volume features. Finally, our proposed method has been evaluated on a public dataset CTSpine1k and an in-house dataset C-ArmLSpine for 3D/2D registration. Results demonstrate that PRSCS-Net achieves state-of-the-art registration performance in terms of registration accuracy, robustness, and generalizability compared with existing methods. Thus, PRSCS-Net has potential for clinical spinal disease surgical planning and surgical navigation systems.</p></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":null,"pages":null},"PeriodicalIF":10.7,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141848252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FetMRQC: A robust quality control system for multi-centric fetal brain MRI","authors":"","doi":"10.1016/j.media.2024.103282","DOIUrl":"10.1016/j.media.2024.103282","url":null,"abstract":"<div><p>Fetal brain MRI is becoming an increasingly relevant complement to neurosonography for perinatal diagnosis, allowing fundamental insights into fetal brain development throughout gestation. However, uncontrolled fetal motion and heterogeneity in acquisition protocols lead to data of variable quality, potentially biasing the outcome of subsequent studies. We present FetMRQC, an open-source machine-learning framework for automated image quality assessment and quality control that is robust to domain shifts induced by the heterogeneity of clinical data. FetMRQC extracts an ensemble of quality metrics from unprocessed anatomical MRI and combines them to predict experts’ ratings using random forests. We validate our framework on a pioneeringly large and diverse dataset of more than 1600 manually rated fetal brain T2-weighted images from four clinical centers and 13 different scanners. Our study shows that FetMRQC’s predictions generalize well to unseen data while being interpretable. FetMRQC is a step towards more robust fetal brain neuroimaging, which has the potential to shed new insights on the developing human brain.</p></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":null,"pages":null},"PeriodicalIF":10.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S136184152400207X/pdfft?md5=b8161959d3d40e266571c6882081e246&pid=1-s2.0-S136184152400207X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141759619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating multi-pathological and multi-modal images and labels for brain MRI","authors":"","doi":"10.1016/j.media.2024.103278","DOIUrl":"10.1016/j.media.2024.103278","url":null,"abstract":"<div><p>The last few years have seen a boom in using generative models to augment real datasets, as synthetic data can effectively model real data distributions and provide privacy-preserving, shareable datasets that can be used to train deep learning models. However, most of these methods are 2D and provide synthetic datasets that come, at most, with categorical annotations. The generation of paired images and segmentation samples that can be used in downstream, supervised segmentation tasks remains fairly uncharted territory. This work proposes a two-stage generative model capable of producing 2D and 3D semantic label maps and corresponding multi-modal images. We use a latent diffusion model for label synthesis and a VAE-GAN for semantic image synthesis. Synthetic datasets provided by this model are shown to work in a wide variety of segmentation tasks, supporting small, real datasets or fully replacing them while maintaining good performance. We also demonstrate its ability to improve downstream performance on out-of-distribution data.</p></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":null,"pages":null},"PeriodicalIF":10.7,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1361841524002032/pdfft?md5=89439036fb9da6236b70017d5e66b48f&pid=1-s2.0-S1361841524002032-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141766575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating synthetic computed tomography for radiotherapy: SynthRAD2023 challenge report","authors":"","doi":"10.1016/j.media.2024.103276","DOIUrl":"10.1016/j.media.2024.103276","url":null,"abstract":"<div><p>Radiation therapy plays a crucial role in cancer treatment, necessitating precise delivery of radiation to tumors while sparing healthy tissues over multiple days. Computed tomography (CT) is integral for treatment planning, offering electron density data crucial for accurate dose calculations. However, accurately representing patient anatomy is challenging, especially in adaptive radiotherapy, where CT is not acquired daily. Magnetic resonance imaging (MRI) provides superior soft-tissue contrast. Still, it lacks electron density information, while cone beam CT (CBCT) lacks direct electron density calibration and is mainly used for patient positioning.</p><p>Adopting MRI-only or CBCT-based adaptive radiotherapy eliminates the need for CT planning but presents challenges. Synthetic CT (sCT) generation techniques aim to address these challenges by using image synthesis to bridge the gap between MRI, CBCT, and CT. The SynthRAD2023 challenge was organized to compare synthetic CT generation methods using multi-center ground truth data from 1080 patients, divided into two tasks: (1) MRI-to-CT and (2) CBCT-to-CT. The evaluation included image similarity and dose-based metrics from proton and photon plans.</p><p>The challenge attracted significant participation, with 617 registrations and 22/17 valid submissions for tasks 1/2. Top-performing teams achieved high structural similarity indices (<span><math><mrow><mo>≥</mo><mn>0</mn><mo>.</mo><mn>87</mn><mo>/</mo><mn>0</mn><mo>.</mo><mn>90</mn></mrow></math></span>) and gamma pass rates for photon (<span><math><mrow><mo>≥</mo><mn>98</mn><mo>.</mo><mn>1</mn><mtext>%</mtext><mo>/</mo><mn>99</mn><mo>.</mo><mn>0</mn><mtext>%</mtext></mrow></math></span>) and proton (<span><math><mrow><mo>≥</mo><mn>97</mn><mo>.</mo><mn>3</mn><mtext>%</mtext><mo>/</mo><mn>97</mn><mo>.</mo><mn>0</mn><mtext>%</mtext></mrow></math></span>) plans. However, no significant correlation was found between image similarity metrics and dose accuracy, emphasizing the need for dose evaluation when assessing the clinical applicability of sCT.</p><p>SynthRAD2023 facilitated the investigation and benchmarking of sCT generation techniques, providing insights for developing MRI-only and CBCT-based adaptive radiotherapy. It showcased the growing capacity of deep learning to produce high-quality sCT, reducing reliance on conventional CT for treatment planning.</p></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":null,"pages":null},"PeriodicalIF":10.7,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1361841524002019/pdfft?md5=ac2a85a25668e619ece44f14ec077f0d&pid=1-s2.0-S1361841524002019-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141788582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diffusion tensor transformation for personalizing target volumes in radiation therapy","authors":"","doi":"10.1016/j.media.2024.103271","DOIUrl":"10.1016/j.media.2024.103271","url":null,"abstract":"<div><p>Diffusion tensor imaging (DTI) is used in tumor growth models to provide information on the infiltration pathways of tumor cells into the surrounding brain tissue. When a patient-specific DTI is not available, a template image such as a DTI atlas can be transformed to the patient anatomy using image registration. This study investigates a model, the invariance under coordinate transform (ICT), that transforms diffusion tensors from a template image to the patient image, based on the principle that the tumor growth process can be mapped, at any point in time, between the images using the same transformation function that we use to map the anatomy. The ICT model allows the mapping of tumor cell densities and tumor fronts (as iso-levels of tumor cell density) from the template image to the patient image for inclusion in radiotherapy treatment planning. The proposed approach transforms the diffusion tensors to simulate tumor growth in locally deformed anatomy and outputs the tumor cell density distribution over time. The ICT model is validated in a cohort of ten brain tumor patients. Comparative analysis with the tumor cell density in the original template image shows that the ICT model accurately simulates tumor cell densities in the deformed image space. By creating radiotherapy target volumes as tumor fronts, this study provides a framework for more personalized radiotherapy treatment planning, without the use of additional imaging.</p></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":null,"pages":null},"PeriodicalIF":10.7,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141736378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DMSPS: Dynamically mixed soft pseudo-label supervision for scribble-supervised medical image segmentation","authors":"","doi":"10.1016/j.media.2024.103274","DOIUrl":"10.1016/j.media.2024.103274","url":null,"abstract":"<div><p>High performance of deep learning on medical image segmentation rely on large-scale pixel-level dense annotations, which poses a substantial burden on medical experts due to the laborious and time-consuming annotation process, particularly for 3D images. To reduce the labeling cost as well as maintain relatively satisfactory segmentation performance, weakly-supervised learning with sparse labels has attained increasing attentions. In this work, we present a scribble-based framework for medical image segmentation, called Dynamically Mixed Soft Pseudo-label Supervision (DMSPS). Concretely, we extend a backbone with an auxiliary decoder to form a dual-branch network to enhance the feature capture capability of the shared encoder. Considering that most pixels do not have labels and hard pseudo-labels tend to be over-confident to result in poor segmentation, we propose to use soft pseudo-labels generated by dynamically mixing the decoders’ predictions as auxiliary supervision. To further enhance the model’s performance, we adopt a two-stage approach where the sparse scribbles are expanded based on predictions with low uncertainties from the first-stage model, leading to more annotated pixels to train the second-stage model. Experiments on ACDC dataset for cardiac structure segmentation, WORD dataset for 3D abdominal organ segmentation and BraTS2020 dataset for 3D brain tumor segmentation showed that: (1) compared with the baseline, our method improved the average <span><math><mrow><mi>D</mi><mi>S</mi><mi>C</mi></mrow></math></span> from 50.46% to 89.51%, from 75.46% to 87.56% and from 52.61% to 76.53% on the three datasets, respectively; (2) DMSPS achieved better performance than five state-of-the-art scribble-supervised segmentation methods, and is generalizable to different segmentation backbones. The code is available online at: <span><span>https://github.com/HiLab-git/DMSPS</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":null,"pages":null},"PeriodicalIF":10.7,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141714277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}