{"title":"SAMCT: Segment Any CT Allowing Labor-Free Task-Indicator Prompts","authors":"Xian Lin;Yangyang Xiang;Zhehao Wang;Kwang-Ting Cheng;Zengqiang Yan;Li Yu","doi":"10.1109/TMI.2024.3493456","DOIUrl":"10.1109/TMI.2024.3493456","url":null,"abstract":"Segment anything model (SAM), a foundation model with superior versatility and generalization across diverse segmentation tasks, has attracted widespread attention in medical imaging. However, it has been proved that SAM would encounter severe performance degradation due to the lack of medical knowledge in training and local feature encoding. Though several SAM-based models have been proposed for tuning SAM in medical imaging, they still suffer from insufficient feature extraction and highly rely on high-quality prompts. In this paper, we propose a powerful foundation model SAMCT allowing labor-free prompts and train it on a collected large CT dataset consisting of 1.1M CT images and 5M masks from public datasets. Specifically, based on SAM, SAMCT is further equipped with a U-shaped CNN image encoder, a cross-branch interaction module, and a task-indicator prompt encoder. The U-shaped CNN image encoder works in parallel with the ViT image encoder in SAM to supplement local features. Cross-branch interaction enhances the feature expression capability of the CNN image encoder and the ViT image encoder by exchanging global perception and local features from one to the other. The task-indicator prompt encoder is a plug-and-play component to effortlessly encode task-related indicators into prompt embeddings. In this way, SAMCT can work in an automatic manner in addition to the semi-automatic interactive strategy in SAM. Extensive experiments demonstrate the superiority of SAMCT against the state-of-the-art task-specific and SAM-based medical foundation models on various tasks. The code, data, and model checkpoints are available at <uri>https://github.com/xianlin7/SAMCT</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 3","pages":"1386-1399"},"PeriodicalIF":0.0,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142596938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junbin Mao;Jin Liu;Xu Tian;Yi Pan;Emanuele Trucco;Hanhe Lin
{"title":"Toward Integrating Federated Learning With Split Learning via Spatio-Temporal Graph Framework for Brain Disease Prediction","authors":"Junbin Mao;Jin Liu;Xu Tian;Yi Pan;Emanuele Trucco;Hanhe Lin","doi":"10.1109/TMI.2024.3493195","DOIUrl":"10.1109/TMI.2024.3493195","url":null,"abstract":"Functional Magnetic Resonance Imaging (fMRI) is used for extracting blood oxygen signals from brain regions to map brain functional connectivity for brain disease prediction. Despite its effectiveness, fMRI has not been widely used: on the one hand, collecting and labeling the data is time-consuming and costly, which limits the amount of valid data collected at a single healthcare site; on the other hand, integrating data from multiple sites is challenging due to data privacy restrictions. To address these issues, we propose a novel, integrated Federated learning and Split learning Spatio-temporal Graph framework (F<inline-formula> <tex-math>$text {S}^{{2}}$ </tex-math></inline-formula>G). Specifically, we introduce federated learning and split learning techniques to split a spatio-temporal model into a client temporal model and a server spatial model. In the client temporal model, we propose a time-aware mechanism to focus on changes in brain functional states and use an InceptionTime model to extract information about changes in the brain states of each subject. In the server spatial model, we propose a united graph convolutional network to integrate multiple graph convolutional networks. Integrating federated learning and split learning, F<inline-formula> <tex-math>$text {S}^{{2}}$ </tex-math></inline-formula>G can utilize multi-site fMRI data without violating data privacy protection and reduce the risk of overfitting as it is capable of learning from limited training data sets. Moreover, it boosts the extraction of spatio-temporal features of fMRI using spatio-temporal graph networks. Experiments on ABIDE and ADHD200 datasets demonstrate that our proposed method outperforms state-of-the-art methods. In addition, we explore biomarkers associated with brain disease prediction using community discovery algorithms using intermediate results of F<inline-formula> <tex-math>$text {S}^{{2}}$ </tex-math></inline-formula>G. The source code is available at <uri>https://github.com/yutian0315/FS2G</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 3","pages":"1334-1346"},"PeriodicalIF":0.0,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142596940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Building a Synthetic Vascular Model: Evaluation in an Intracranial Aneurysms Detection Scenario","authors":"Rafic Nader;Florent Autrusseau;Vincent L’Allinec;Romain Bourcier","doi":"10.1109/TMI.2024.3492313","DOIUrl":"10.1109/TMI.2024.3492313","url":null,"abstract":"We hereby present a full synthetic model, able to mimic the various constituents of the cerebral vascular tree, including the cerebral arteries, bifurcations and intracranial aneurysms. This model intends to provide a substantial dataset of brain arteries which could be used by a 3D convolutional neural network to efficiently detect Intra-Cranial Aneurysms. The cerebral aneurysms most often occur on a particular structure of the vascular tree named the Circle of Willis. Various studies have been conducted to detect and monitor the aneurysms and those based on Deep Learning achieve the best performance. Specifically, in this work, we propose a full synthetic 3D model able to mimic the brain vasculature as acquired by Magnetic Resonance Angiography, Time Of Flight principle. Among the various MRI modalities, this latter allows for a good rendering of the blood vessels and is non-invasive. Our model has been designed to simultaneously mimic the arteries’ geometry, the aneurysm shape, and the background noise. The vascular tree geometry is modeled thanks to an interpolation with 3D Spline functions, and the statistical properties of the background noise is collected from angiography acquisitions and reproduced within the model. In this work, we thoroughly describe the synthetic vasculature model, we build up a neural network designed for aneurysm segmentation and detection, finally, we carry out an in-depth evaluation of the performance gap gained thanks to the synthetic model data augmentation.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 3","pages":"1347-1358"},"PeriodicalIF":0.0,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142591694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FAMF-Net: Feature Alignment Mutual Attention Fusion With Region Awareness for Breast Cancer Diagnosis via Imbalanced Data","authors":"Yiyao Liu;Jinyao Li;Cheng Zhao;Yongtao Zhang;Qian Chen;Jing Qin;Lei Dong;Tianfu Wang;Wei Jiang;Baiying Lei","doi":"10.1109/TMI.2024.3485612","DOIUrl":"10.1109/TMI.2024.3485612","url":null,"abstract":"Automatic and accurate classification of breast cancer in multimodal ultrasound images is crucial to improve patients’ diagnosis and treatment effect and save medical resources. Methodologically, the fusion of multimodal ultrasound images often encounters challenges such as misalignment, limited utilization of complementary information, poor interpretability in feature fusion, and imbalances in sample categories. To solve these problems, we propose a feature alignment mutual attention fusion method (FAMF-Net), which consists of a region awareness alignment (RAA) block, a mutual attention fusion (MAF) block, and a reinforcement learning-based dynamic optimization strategy(RDO). Specifically, RAA achieves region awareness through class activation mapping and performs translation transformation to achieve feature alignment. When MAF utilizes a mutual attention mechanism for feature interaction fusion, it mines edge and color features separately in B-mode and shear wave elastography images, enhancing the complementarity of features and improving interpretability. Finally, RDO uses the distribution of samples and prediction probabilities during training as the state of reinforcement learning to dynamically optimize the weights of the loss function, thereby solving the problem of class imbalance. The experimental results based on our clinically obtained dataset demonstrate the effectiveness of the proposed method. Our code will be available at: <uri>https://github.com/Magnety/Multi_modal_Image</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 3","pages":"1153-1167"},"PeriodicalIF":0.0,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142585413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Corrections to “Contrastive Graph Pooling for Explainable Classification of Brain Networks”","authors":"Jiaxing Xu;Qingtian Bian;Xinhang Li;Aihu Zhang;Yiping Ke;Miao Qiao;Wei Zhang;Wei Khang Jeremy Sim;Balázs Gulyás","doi":"10.1109/TMI.2024.3465968","DOIUrl":"10.1109/TMI.2024.3465968","url":null,"abstract":"","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"43 11","pages":"4075-4075"},"PeriodicalIF":0.0,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10741900","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142577333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kelly Payette;Céline Steger;Roxane Licandro;Priscille de Dumast;Hongwei Bran Li;Matthew Barkovich;Liu Li;Maik Dannecker;Chen Chen;Cheng Ouyang;Niccolò McConnell;Alina Miron;Yongmin Li;Alena Uus;Irina Grigorescu;Paula Ramirez Gilliland;Md Mahfuzur Rahman Siddiquee;Daguang Xu;Andriy Myronenko;Haoyu Wang;Ziyan Huang;Jin Ye;Mireia Alenyà;Valentin Comte;Oscar Camara;Jean-Baptiste Masson;Astrid Nilsson;Charlotte Godard;Moona Mazher;Abdul Qayyum;Yibo Gao;Hangqi Zhou;Shangqi Gao;Jia Fu;Guiming Dong;Guotai Wang;ZunHyan Rieu;HyeonSik Yang;Minwoo Lee;Szymon Płotka;Michal K. Grzeszczyk;Arkadiusz Sitek;Luisa Vargas Daza;Santiago Usma;Pablo Arbelaez;Wenying Lu;Wenhao Zhang;Jing Liang;Romain Valabregue;Anand A. Joshi;Krishna N. Nayak;Richard M. Leahy;Luca Wilhelmi;Aline Dändliker;Hui Ji;Antonio G. Gennari;Anton Jakovčić;Melita Klaić;Ana Adžić;Pavel Marković;Gracia Grabarić;Gregor Kasprian;Gregor Dovjak;Milan Rados;Lana Vasung;Meritxell Bach Cuadra;Andras Jakab
{"title":"Multi-Center Fetal Brain Tissue Annotation (FeTA) Challenge 2022 Results","authors":"Kelly Payette;Céline Steger;Roxane Licandro;Priscille de Dumast;Hongwei Bran Li;Matthew Barkovich;Liu Li;Maik Dannecker;Chen Chen;Cheng Ouyang;Niccolò McConnell;Alina Miron;Yongmin Li;Alena Uus;Irina Grigorescu;Paula Ramirez Gilliland;Md Mahfuzur Rahman Siddiquee;Daguang Xu;Andriy Myronenko;Haoyu Wang;Ziyan Huang;Jin Ye;Mireia Alenyà;Valentin Comte;Oscar Camara;Jean-Baptiste Masson;Astrid Nilsson;Charlotte Godard;Moona Mazher;Abdul Qayyum;Yibo Gao;Hangqi Zhou;Shangqi Gao;Jia Fu;Guiming Dong;Guotai Wang;ZunHyan Rieu;HyeonSik Yang;Minwoo Lee;Szymon Płotka;Michal K. Grzeszczyk;Arkadiusz Sitek;Luisa Vargas Daza;Santiago Usma;Pablo Arbelaez;Wenying Lu;Wenhao Zhang;Jing Liang;Romain Valabregue;Anand A. Joshi;Krishna N. Nayak;Richard M. Leahy;Luca Wilhelmi;Aline Dändliker;Hui Ji;Antonio G. Gennari;Anton Jakovčić;Melita Klaić;Ana Adžić;Pavel Marković;Gracia Grabarić;Gregor Kasprian;Gregor Dovjak;Milan Rados;Lana Vasung;Meritxell Bach Cuadra;Andras Jakab","doi":"10.1109/TMI.2024.3485554","DOIUrl":"10.1109/TMI.2024.3485554","url":null,"abstract":"Segmentation is a critical step in analyzing the developing human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, limiting real-world clinical applicability and acceptance. The multi-center FeTA Challenge 2022 focused on advancing the generalizability of fetal brain segmentation algorithms for magnetic resonance imaging (MRI). In FeTA 2022, the training dataset contained images and corresponding manually annotated multi-class labels from two imaging centers, and the testing data contained images from these two centers as well as two additional unseen centers. The multi-center data included different MR scanners, imaging parameters, and fetal brain super-resolution algorithms applied. 16 teams participated and 17 algorithms were evaluated. Here, the challenge results are presented, focusing on the generalizability of the submissions. Both in- and out-of-domain, the white matter and ventricles were segmented with the highest accuracy (Top Dice scores: 0.89, 0.87 respectively), while the most challenging structure remains the grey matter (Top Dice score: 0.75) due to anatomical complexity. The top 5 average Dices scores ranged from 0.81-0.82, the top 5 average <inline-formula> <tex-math>$95^{text {th}}$ </tex-math></inline-formula> percentile Hausdorff distance values ranged from 2.3-2.5mm, and the top 5 volumetric similarity scores ranged from 0.90-0.92. The FeTA Challenge 2022 was able to successfully evaluate and advance generalizability of multi-class fetal brain tissue segmentation algorithms for MRI and it continues to benchmark new algorithms.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 3","pages":"1257-1272"},"PeriodicalIF":0.0,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10738483","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anthony A. Gatti;Louis Blankemeier;Dave Van Veen;Brian Hargreaves;Scott L. Delp;Garry E. Gold;Feliks Kogan;Akshay S. Chaudhari
{"title":"ShapeMed-Knee: A Dataset and Neural Shape Model Benchmark for Modeling 3D Femurs","authors":"Anthony A. Gatti;Louis Blankemeier;Dave Van Veen;Brian Hargreaves;Scott L. Delp;Garry E. Gold;Feliks Kogan;Akshay S. Chaudhari","doi":"10.1109/TMI.2024.3485613","DOIUrl":"10.1109/TMI.2024.3485613","url":null,"abstract":"Analyzing anatomic shapes of tissues and organs is pivotal for accurate disease diagnostics and clinical decision-making. One prominent disease that depends on anatomic shape analysis is osteoarthritis, which affects 30 million Americans. To advance osteoarthritis diagnostics and prognostics, we introduce ShapeMed-Knee, a 3D shape dataset with 9,376 high-resolution, medical-imaging-based 3D shapes of both femur bone and cartilage. Besides data, ShapeMed-Knee includes two benchmarks for assessing reconstruction accuracy and five clinical prediction tasks that assess the utility of learned shape representations. Leveraging ShapeMed-Knee, we develop and evaluate a novel hybrid explicit-implicit neural shape model which achieves up to 40% better reconstruction accuracy than a statistical shape model and two implicit neural shape models. Our hybrid models achieve state-of-the-art performance for preserving cartilage biomarkers (root mean squared error ≤ 0.05 vs. ≤ 0.07, 0.10, and 0.14). Our models are also the first to successfully predict localized structural features of osteoarthritis, outperforming shape models and convolutional neural networks applied to raw magnetic resonance images and segmentations (e.g., osteophyte size and localization 63% accuracy vs. 49-61%). The ShapeMed-Knee dataset provides medical evaluations to reconstruct multiple anatomic surfaces and embed meaningful disease-specific information. ShapeMed-Knee reduces barriers to applying 3D modeling in medicine, and our benchmarks highlight that advancements in 3D modeling can enhance the diagnosis and risk stratification for complex diseases. The dataset, code, and benchmarks are freely accessible.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 3","pages":"1140-1152"},"PeriodicalIF":0.0,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142490295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asynchronous Functional Brain Network Construction With Spatiotemporal Transformer for MCI Classification","authors":"Jianjia Zhang;Xiaotong Wu;Xiang Tang;Luping Zhou;Lei Wang;Weiwen Wu;Dinggang Shen","doi":"10.1109/TMI.2024.3486086","DOIUrl":"10.1109/TMI.2024.3486086","url":null,"abstract":"Construction and analysis of functional brain networks (FBNs) with resting-state functional magnetic resonance imaging (rs-fMRI) is a promising method to diagnose functional brain diseases. Nevertheless, the existing methods suffer from several limitations. First, the functional connectivities (FCs) of the FBN are usually measured by the temporal co-activation level between rs-fMRI time series from regions of interest (ROIs). While enjoying simplicity, the existing approach implicitly assumes simultaneous co-activation of all the ROIs, and models only their synchronous dependencies. However, the FCs are not necessarily always synchronous due to the time lag of information flow and cross-time interactions between ROIs. Therefore, it is desirable to model asynchronous FCs. Second, the traditional methods usually construct FBNs at individual level, leading to large variability and degraded diagnosis accuracy when modeling asynchronous FBN. Third, the FBN construction and analysis are conducted in two independent steps without joint alignment for the target diagnosis task. To address the first limitation, this paper proposes an effective sliding-window-based method to model spatiotemporal FCs in Transformer. Regarding the second limitation, we propose to learn common and individual FBNs adaptively with the common FBN as prior knowledge, thus alleviating the variability and enabling the network to focus on the individual disease-specific asynchronous FCs. To address the third limitation, the common and individual asynchronous FBNs are built and analyzed by an integrated network, enabling end-to-end training and improving the flexibility and discriminability. The effectiveness of the proposed method is consistently demonstrated on three data sets for mild cognitive impairment (MCI) diagnosis.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 3","pages":"1168-1180"},"PeriodicalIF":0.0,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142489520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Time-Resolved Laser Speckle Contrast Imaging (TR-LSCI) of Cerebral Blood Flow","authors":"Faraneh Fathi;Siavash Mazdeyasna;Dara Singh;Chong Huang;Mehrana Mohtasebi;Xuhui Liu;Samaneh Rabienia Haratbar;Mingjun Zhao;Li Chen;Arin Can Ulku;Paul Mos;Claudio Bruschini;Edoardo Charbon;Lei Chen;Guoqiang Yu","doi":"10.1109/TMI.2024.3486084","DOIUrl":"10.1109/TMI.2024.3486084","url":null,"abstract":"To address many of the deficiencies in optical neuroimaging technologies, such as poor tempo-spatial resolution, low penetration depth, contact-based measurement, and time-consuming image reconstruction, a novel, noncontact, portable, time-resolved laser speckle contrast imaging (TR-LSCI) technique has been developed for continuous, fast, and high-resolution 2D mapping of cerebral blood flow (CBF) at different depths of the head. TR-LSCI illuminates the head with picosecond-pulsed, coherent, widefield near-infrared light and synchronizes a fast, high-resolution, gated single-photon avalanche diode camera to selectively collect diffuse photons with longer pathlengths through the head, thus improving the accuracy of CBF measurement in the deep brain. The reconstruction of a CBF map was dramatically expedited by incorporating convolution functions with parallel computations. The performance of TR-LSCI was evaluated using head-simulating phantoms with known properties and in-vivo rodents with varied hemodynamic challenges to the brain. TR-LSCI enabled mapping CBF variations at different depths with a sampling rate of up to 1 Hz and spatial resolutions ranging from tens/hundreds of micrometers on rodent head surfaces to 1-2 millimeters in deep brains. With additional improvements and validation in larger populations against established methods, we anticipate offering a noncontact, fast, high-resolution, portable, and affordable brain imager for fundamental neuroscience research in animals and for translational studies in humans.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 3","pages":"1206-1217"},"PeriodicalIF":0.0,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142489601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiangbo Shi;Chen Li;Tieliang Gong;Chunbao Wang;Huazhu Fu
{"title":"CoD-MIL: Chain-of-Diagnosis Prompting Multiple Instance Learning for Whole Slide Image Classification","authors":"Jiangbo Shi;Chen Li;Tieliang Gong;Chunbao Wang;Huazhu Fu","doi":"10.1109/TMI.2024.3485120","DOIUrl":"10.1109/TMI.2024.3485120","url":null,"abstract":"Multiple instance learning (MIL) has emerged as a prominent paradigm for processing the whole slide image with pyramid structure and giga-pixel size in digital pathology. However, existing attention-based MIL methods are primarily trained on the image modality and a pre-defined label set, leading to limited generalization and interpretability. Recently, vision language models (VLM) have achieved promising performance and transferability, offering potential solutions to the limitations of MIL-based methods. Pathological diagnosis is an intricate process that requires pathologists to examine the WSI step-by-step. In the field of natural language process, the chain-of-thought (CoT) prompting method is widely utilized to imitate the human reasoning process. Inspired by the CoT prompt and pathologists’ clinic knowledge, we propose a chain-of-diagnosis prompting multiple instance learning (CoD-MIL) framework for whole slide image classification. Specifically, the chain-of-diagnosis text prompt decomposes the complex diagnostic process in WSI into progressive sub-processes from low to high magnification. Additionally, we propose a text-guided contrastive masking module to accurately localize the tumor region by masking the most discriminative instances and introducing the guidance of normal tissue texts in a contrastive way. Extensive experiments conducted on three real-world subtyping datasets demonstrate the effectiveness and superiority of CoD-MIL.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 3","pages":"1218-1229"},"PeriodicalIF":0.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142488344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}