Rafic Nader, Florent Autrusseau, Vincent L'Allinec, Romain Bourcier
{"title":"Building a Synthetic Vascular Model: Evaluation in an Intracranial Aneurysms Detection Scenario.","authors":"Rafic Nader, Florent Autrusseau, Vincent L'Allinec, Romain Bourcier","doi":"10.1109/TMI.2024.3492313","DOIUrl":"https://doi.org/10.1109/TMI.2024.3492313","url":null,"abstract":"<p><p>We hereby present a full synthetic model, able to mimic the various constituents of the cerebral vascular tree, including the cerebral arteries, bifurcations and intracranial aneurysms. This model intends to provide a substantial dataset of brain arteries which could be used by a 3D convolutional neural network to efficiently detect Intra-Cranial Aneurysms. The cerebral aneurysms most often occur on a particular structure of the vascular tree named the Circle of Willis. Various studies have been conducted to detect and monitor the aneurysms and those based on Deep Learning achieve the best performance. Specifically, in this work, we propose a full synthetic 3D model able to mimic the brain vasculature as acquired by Magnetic Resonance Angiography, Time Of Flight principle. Among the various MRI modalities, this latter allows for a good rendering of the blood vessels and is non-invasive. Our model has been designed to simultaneously mimic the arteries' geometry, the aneurysm shape, and the background noise. The vascular tree geometry is modeled thanks to an interpolation with 3D Spline functions, and the statistical properties of the background noise is collected from angiography acquisitions and reproduced within the model. In this work, we thoroughly describe the synthetic vasculature model, we build up a neural network designed for aneurysm segmentation and detection, finally, we carry out an in-depth evaluation of the performance gap gained thanks to the synthetic model data augmentation.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142591694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiyao Liu, Jinyao Li, Cheng Zhao, Yongtao Zhang, Qian Chen, Jing Qin, Lei Dong, Tianfu Wang, Wei Jiang, Baiying Lei
{"title":"FAMF-Net: Feature Alignment Mutual Attention Fusion with Region Awareness for Breast Cancer Diagnosis via Imbalanced Data.","authors":"Yiyao Liu, Jinyao Li, Cheng Zhao, Yongtao Zhang, Qian Chen, Jing Qin, Lei Dong, Tianfu Wang, Wei Jiang, Baiying Lei","doi":"10.1109/TMI.2024.3485612","DOIUrl":"https://doi.org/10.1109/TMI.2024.3485612","url":null,"abstract":"<p><p>Automatic and accurate classification of breast cancer in multimodal ultrasound images is crucial to improve patients' diagnosis and treatment effect and save medical resources. Methodologically, the fusion of multimodal ultrasound images often encounters challenges such as misalignment, limited utilization of complementary information, poor interpretability in feature fusion, and imbalances in sample categories. To solve these problems, we propose a feature alignment mutual attention fusion method (FAMF-Net), which consists of a region awareness alignment (RAA) block, a mutual attention fusion (MAF) block, and a reinforcement learning-based dynamic optimization strategy(RDO). Specifically, RAA achieves region awareness through class activation mapping and performs translation transformation to achieve feature alignment. When MAF utilizes a mutual attention mechanism for feature interaction fusion, it mines edge and color features separately in B-mode and shear wave elastography images, enhancing the complementarity of features and improving interpretability. Finally, RDO uses the distribution of samples and prediction probabilities during training as the state of reinforcement learning to dynamically optimize the weights of the loss function, thereby solving the problem of class imbalance. The experimental results based on our clinically obtained dataset demonstrate the effectiveness of the proposed method. Our code will be available at: https://github.com/Magnety/Multi_modal_Image.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142585413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Corrections to “Contrastive Graph Pooling for Explainable Classification of Brain Networks”","authors":"Jiaxing Xu;Qingtian Bian;Xinhang Li;Aihu Zhang;Yiping Ke;Miao Qiao;Wei Zhang;Wei Khang Jeremy Sim;Balázs Gulyás","doi":"10.1109/TMI.2024.3465968","DOIUrl":"10.1109/TMI.2024.3465968","url":null,"abstract":"","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"43 11","pages":"4075-4075"},"PeriodicalIF":0.0,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10741900","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142577333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kelly Payette, Celine Steger, Roxane Licandro, Priscille De Dumast, Hongwei Bran Li, Matthew Barkovich, Liu Li, Maik Dannecker, Chen Chen, Cheng Ouyang, Niccolo McConnell, Alina Miron, Yongmin Li, Alena Uus, Irina Grigorescu, Paula Ramirez Gilliland, Md Mahfuzur Rahman Siddiquee, Daguang Xu, Andriy Myronenko, Haoyu Wang, Ziyan Huang, Jin Ye, Mireia Alenya, Valentin Comte, Oscar Camara, Jean-Baptiste Masson, Astrid Nilsson, Charlotte Godard, Moona Mazher, Abdul Qayyum, Yibo Gao, Hangqi Zhou, Shangqi Gao, Jia Fu, Guiming Dong, Guotai Wang, ZunHyan Rieu, HyeonSik Yang, Minwoo Lee, Szymon Plotka, Michal K Grzeszczyk, Arkadiusz Sitek, Luisa Vargas Daza, Santiago Usma, Pablo Arbelaez, Wenying Lu, Wenhao Zhang, Jing Liang, Romain Valabregue, Anand A Joshi, Krishna N Nayak, Richard M Leahy, Luca Wilhelmi, Aline Dandliker, Hui Ji, Antonio G Gennari, Anton Jakovcic, Melita Klaic, Ana Adzic, Pavel Markovic, Gracia Grabaric, Gregor Kasprian, Gregor Dovjak, Milan Rados, Lana Vasung, Meritxell Bach Cuadra, Andras Jakab
{"title":"Multi-Center Fetal Brain Tissue Annotation (FeTA) Challenge 2022 Results.","authors":"Kelly Payette, Celine Steger, Roxane Licandro, Priscille De Dumast, Hongwei Bran Li, Matthew Barkovich, Liu Li, Maik Dannecker, Chen Chen, Cheng Ouyang, Niccolo McConnell, Alina Miron, Yongmin Li, Alena Uus, Irina Grigorescu, Paula Ramirez Gilliland, Md Mahfuzur Rahman Siddiquee, Daguang Xu, Andriy Myronenko, Haoyu Wang, Ziyan Huang, Jin Ye, Mireia Alenya, Valentin Comte, Oscar Camara, Jean-Baptiste Masson, Astrid Nilsson, Charlotte Godard, Moona Mazher, Abdul Qayyum, Yibo Gao, Hangqi Zhou, Shangqi Gao, Jia Fu, Guiming Dong, Guotai Wang, ZunHyan Rieu, HyeonSik Yang, Minwoo Lee, Szymon Plotka, Michal K Grzeszczyk, Arkadiusz Sitek, Luisa Vargas Daza, Santiago Usma, Pablo Arbelaez, Wenying Lu, Wenhao Zhang, Jing Liang, Romain Valabregue, Anand A Joshi, Krishna N Nayak, Richard M Leahy, Luca Wilhelmi, Aline Dandliker, Hui Ji, Antonio G Gennari, Anton Jakovcic, Melita Klaic, Ana Adzic, Pavel Markovic, Gracia Grabaric, Gregor Kasprian, Gregor Dovjak, Milan Rados, Lana Vasung, Meritxell Bach Cuadra, Andras Jakab","doi":"10.1109/TMI.2024.3485554","DOIUrl":"https://doi.org/10.1109/TMI.2024.3485554","url":null,"abstract":"<p><p>Segmentation is a critical step in analyzing the developing human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, limiting real-world clinical applicability and acceptance. The multi-center FeTA Challenge 2022 focused on advancing the generalizability of fetal brain segmentation algorithms for magnetic resonance imaging (MRI). In FeTA 2022, the training dataset contained images and corresponding manually annotated multi-class labels from two imaging centers, and the testing data contained images from these two centers as well as two additional unseen centers. The multi-center data included different MR scanners, imaging parameters, and fetal brain super-resolution algorithms applied. 16 teams participated and 17 algorithms were evaluated. Here, the challenge results are presented, focusing on the generalizability of the submissions. Both in- and out-of-domain, the white matter and ventricles were segmented with the highest accuracy (Top Dice scores: 0.89, 0.87 respectively), while the most challenging structure remains the grey matter (Top Dice score: 0.75) due to anatomical complexity. The top 5 average Dices scores ranged from 0.81-0.82, the top 5 average 95<sup>th</sup> percentile Hausdorff distance values ranged from 2.3-2.5mm, and the top 5 volumetric similarity scores ranged from 0.90-0.92. The FeTA Challenge 2022 was able to successfully evaluate and advance generalizability of multi-class fetal brain tissue segmentation algorithms for MRI and it continues to benchmark new algorithms.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CQformer: Learning Dynamics Across Slices in Medical Image Segmentation.","authors":"Shengjie Zhang, Xin Shen, Xiang Chen, Ziqi Yu, Bohan Ren, Haibo Yang, Xiao-Yong Zhang, Yuan Zhou","doi":"10.1109/TMI.2024.3477555","DOIUrl":"https://doi.org/10.1109/TMI.2024.3477555","url":null,"abstract":"<p><p>Prevalent studies on deep learning-based 3D medical image segmentation capture the continuous variation across 2D slices mainly via convolution, Transformer, inter-slice interaction, and time series models. In this work, via modeling this variation by an ordinary differential equation (ODE), we propose a cross instance query-guided Transformer architecture (CQformer) that leverages features from preceding slices to improve the segmentation performance of subsequent slices. Its key components include a cross-attention mechanism in an ODE formulation, which bridges the features of contiguous 2D slices of the 3D volumetric data. In addition, a regression head is employed to shorten the gap between the bottleneck and the prediction layer. Extensive experiments on 7 datasets with various modalities (CT, MRI) and tasks (organ, tissue, and lesion) demonstrate that CQformer outperforms previous state-of-the-art segmentation algorithms on 6 datasets by 0.44%-2.45%, and achieves the second highest performance of 88.30% on the BTCV dataset. The code will be publicly available after acceptance.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuqi Tang, Nanchao Wang, Zhijie Dong, Matthew Lowerison, Angela Del Aguila, Natalie Johnston, Tri Vu, Chenshuo Ma, Yirui Xu, Wei Yang, Pengfei Song, Junjie Yao
{"title":"Non-invasive Deep-Brain Imaging with 3D Integrated Photoacoustic Tomography and Ultrasound Localization Microscopy (3D-PAULM).","authors":"Yuqi Tang, Nanchao Wang, Zhijie Dong, Matthew Lowerison, Angela Del Aguila, Natalie Johnston, Tri Vu, Chenshuo Ma, Yirui Xu, Wei Yang, Pengfei Song, Junjie Yao","doi":"10.1109/TMI.2024.3477317","DOIUrl":"10.1109/TMI.2024.3477317","url":null,"abstract":"<p><p>Photoacoustic computed tomography (PACT) is a proven technology for imaging hemodynamics in deep brain of small animal models. PACT is inherently compatible with ultrasound (US) imaging, providing complementary contrast mechanisms. While PACT can quantify the brain's oxygen saturation of hemoglobin (sO2), US imaging can probe the blood flow based on the Doppler effect. Further, by tracking gas-filled microbubbles, ultrasound localization microscopy (ULM) can map the blood flow velocity with sub-diffraction spatial resolution. In this work, we present a 3D deep-brain imaging system that seamlessly integrates PACT and ULM into a single device, 3D-PAULM. Using a low ultrasound frequency of 4 MHz, 3D-PAULM is capable of imaging the brain hemodynamic functions with intact scalp and skull in a totally non-invasive manner. Using 3D-PAULM, we studied the mouse brain functions with ischemic stroke. Multi-spectral PACT, US B-mode imaging, microbubble-enhanced power Doppler (PD), and ULM were performed on the same mouse brain with intrinsic image co-registration. From the multi-modality measurements, we further quantified blood perfusion, sO2, vessel density, and flow velocity of the mouse brain, showing stroke-induced ischemia, hypoxia, and reduced blood flow. We expect that 3D-PAULM can find broad applications in studying deep brain functions on small animal models.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142396305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanfeng Zhou, Lingrui Li, Chenlong Wang, Le Song, Ge Yang
{"title":"GobletNet: Wavelet-Based High-Frequency Fusion Network for Semantic Segmentation of Electron Microscopy Images.","authors":"Yanfeng Zhou, Lingrui Li, Chenlong Wang, Le Song, Ge Yang","doi":"10.1109/TMI.2024.3474028","DOIUrl":"https://doi.org/10.1109/TMI.2024.3474028","url":null,"abstract":"<p><p>Semantic segmentation of electron microscopy (EM) images is crucial for nanoscale analysis. With the development of deep neural networks (DNNs), semantic segmentation of EM images has achieved remarkable success. However, current EM image segmentation models are usually extensions or adaptations of natural or biomedical models. They lack the full exploration and utilization of the intrinsic characteristics of EM images. Furthermore, they are often designed only for several specific segmentation objects and lack versatility. In this study, we quantitatively analyze the characteristics of EM images compared with those of natural and other biomedical images via the wavelet transform. To better utilize these characteristics, we design a high-frequency (HF) fusion network, GobletNet, which outperforms state-of-the-art models by a large margin in the semantic segmentation of EM images. We use the wavelet transform to generate HF images as extra inputs and use an extra encoding branch to extract HF information. Furthermore, we introduce a fusion-attention module (FAM) into GobletNet to facilitate better absorption and fusion of information from raw images and HF images. Extensive benchmarking on seven public EM datasets (EPFL, CREMI, SNEMI3D, UroCell, MitoEM, Nanowire and BetaSeg) demonstrates the effectiveness of our model. The code is available at https://github.com/Yanfeng-Zhou/GobletNet.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142376487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhentao Liu, Yu Fang, Changjian Li, Han Wu, Yuan Liu, Dinggang Shen, Zhiming Cui
{"title":"Geometry-Aware Attenuation Learning for Sparse-View CBCT Reconstruction.","authors":"Zhentao Liu, Yu Fang, Changjian Li, Han Wu, Yuan Liu, Dinggang Shen, Zhiming Cui","doi":"10.1109/TMI.2024.3473970","DOIUrl":"https://doi.org/10.1109/TMI.2024.3473970","url":null,"abstract":"<p><p>Cone Beam Computed Tomography (CBCT) plays a vital role in clinical imaging. Traditional methods typically require hundreds of 2D X-ray projections to reconstruct a high-quality 3D CBCT image, leading to considerable radiation exposure. This has led to a growing interest in sparse-view CBCT reconstruction to reduce radiation doses. While recent advances, including deep learning and neural rendering algorithms, have made strides in this area, these methods either produce unsatisfactory results or suffer from time inefficiency of individual optimization. In this paper, we introduce a novel geometry-aware encoder-decoder framework to solve this problem. Our framework starts by encoding multi-view 2D features from various 2D X-ray projections with a 2D CNN encoder. Leveraging the geometry of CBCT scanning, it then back-projects the multi-view 2D features into the 3D space to formulate a comprehensive volumetric feature map, followed by a 3D CNN decoder to recover 3D CBCT image. Importantly, our approach respects the geometric relationship between 3D CBCT image and its 2D X-ray projections during feature back projection stage, and enjoys the prior knowledge learned from the data population. This ensures its adaptability in dealing with extremely sparse view inputs without individual training, such as scenarios with only 5 or 10 X-ray projections. Extensive evaluations on two simulated datasets and one real-world dataset demonstrate exceptional reconstruction quality and time efficiency of our method.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142376486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mareike Thies, Fabian Wagner, Noah Maul, Haijun Yu, Manuela Goldmann, Linda-Sophie Schneider, Mingxuan Gu, Siyuan Mei, Lukas Folle, Alexander Preuhs, Michael Manhart, Andreas Maier
{"title":"A gradient-based approach to fast and accurate head motion compensation in cone-beam CT.","authors":"Mareike Thies, Fabian Wagner, Noah Maul, Haijun Yu, Manuela Goldmann, Linda-Sophie Schneider, Mingxuan Gu, Siyuan Mei, Lukas Folle, Alexander Preuhs, Michael Manhart, Andreas Maier","doi":"10.1109/TMI.2024.3474250","DOIUrl":"10.1109/TMI.2024.3474250","url":null,"abstract":"<p><p>Cone-beam computed tomography (CBCT) systems, with their flexibility, present a promising avenue for direct point-of-care medical imaging, particularly in critical scenarios such as acute stroke assessment. However, the integration of CBCT into clinical workflows faces challenges, primarily linked to long scan duration resulting in patient motion during scanning and leading to image quality degradation in the reconstructed volumes. This paper introduces a novel approach to CBCT motion estimation using a gradient-based optimization algorithm, which leverages generalized derivatives of the backprojection operator for cone-beam CT geometries. Building on that, a fully differentiable target function is formulated which grades the quality of the current motion estimate in reconstruction space. We drastically accelerate motion estimation yielding a 19-fold speed-up compared to existing methods. Additionally, we investigate the architecture of networks used for quality metric regression and propose predicting voxel-wise quality maps, favoring autoencoder-like architectures over contracting ones. This modification improves gradient flow, leading to more accurate motion estimation. The presented method is evaluated through realistic experiments on head anatomy. It achieves a reduction in reprojection error from an initial average of 3 mm to 0.61 mm after motion compensation and consistently demonstrates superior performance compared to existing approaches. The analytic Jacobian for the backprojection operation, which is at the core of the proposed method, is made publicly available. In summary, this paper contributes to the advancement of CBCT integration into clinical workflows by proposing a robust motion estimation approach that enhances efficiency and accuracy, addressing critical challenges in time-sensitive scenarios.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142376485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yongjian Wu, Yang Zhou, Jiya Saiyin, Bingzheng Wei, Maode Lai, Jianzhong Shou, Yan Xu
{"title":"AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models.","authors":"Yongjian Wu, Yang Zhou, Jiya Saiyin, Bingzheng Wei, Maode Lai, Jianzhong Shou, Yan Xu","doi":"10.1109/TMI.2024.3473745","DOIUrl":"10.1109/TMI.2024.3473745","url":null,"abstract":"<p><p>Large-scale visual-language pre-trained models (VLPMs) have demonstrated exceptional performance in downstream object detection through text prompts for natural scenes. However, their application to zero-shot nuclei detection on histopathology images remains relatively unexplored, mainly due to the significant gap between the characteristics of medical images and the weboriginated text-image pairs used for pre-training. This paper aims to investigate the potential of the object-level VLPM, Grounded Language-Image Pre-training (GLIP), for zero-shot nuclei detection. Specifically, we propose an innovative auto-prompting pipeline, named AttriPrompter, comprising attribute generation, attribute augmentation, and relevance sorting, to avoid subjective manual prompt design. AttriPrompter utilizes VLPMs' text-to-image alignment to create semantically rich text prompts, which are then fed into GLIP for initial zero-shot nuclei detection. Additionally, we propose a self-trained knowledge distillation framework, where GLIP serves as the teacher with its initial predictions used as pseudo labels, to address the challenges posed by high nuclei density, including missed detections, false positives, and overlapping instances. Our method exhibits remarkable performance in label-free nuclei detection, out-performing all existing unsupervised methods and demonstrating excellent generality. Notably, this work highlights the astonishing potential of VLPMs pre-trained on natural image-text pairs for downstream tasks in the medical field as well. Code will be released at github.com/AttriPrompter.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}