{"title":"Shapley Values-enabled Progressive Pseudo Bag Augmentation for Whole-Slide Image Classification.","authors":"Renao Yan, Qiehe Sun, Cheng Jin, Yiqing Liu, Yonghong He, Tian Guan, Hao Chen","doi":"10.1109/TMI.2024.3453386","DOIUrl":"10.1109/TMI.2024.3453386","url":null,"abstract":"<p><p>In computational pathology, whole-slide image (WSI) classification presents a formidable challenge due to its gigapixel resolution and limited fine-grained annotations. Multiple-instance learning (MIL) offers a weakly supervised solution, yet refining instance-level information from bag-level labels remains challenging. While most of the conventional MIL methods use attention scores to estimate instance importance scores (IIS) which contribute to the prediction of the slide labels, these often lead to skewed attention distributions and inaccuracies in identifying crucial instances. To address these issues, we propose a new approach inspired by cooperative game theory: employing Shapley values to assess each instance's contribution, thereby improving IIS estimation. The computation of the Shapley value is then accelerated using attention, meanwhile retaining the enhanced instance identification and prioritization. We further introduce a framework for the progressive assignment of pseudo bags based on estimated IIS, encouraging more balanced attention distributions in MIL models. Our extensive experiments on CAMELYON-16, BRACS, TCGA-LUNG, and TCGA-BRCA datasets show our method's superiority over existing state-of-the-art approaches, offering enhanced interpretability and class-wise insights. We will release the code upon acceptance.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vishnuvardhan Purma, Suhas Srinath, Seshan Srirangarajan, Aanchal Kakkar, A P Prathosh
{"title":"GenSelfDiff-HIS: Generative Self-Supervision Using Diffusion for Histopathological Image Segmentation.","authors":"Vishnuvardhan Purma, Suhas Srinath, Seshan Srirangarajan, Aanchal Kakkar, A P Prathosh","doi":"10.1109/TMI.2024.3453492","DOIUrl":"https://doi.org/10.1109/TMI.2024.3453492","url":null,"abstract":"<p><p>Histopathological image segmentation is a laborious and time-intensive task, often requiring analysis from experienced pathologists for accurate examinations. To reduce this burden, supervised machine-learning approaches have been adopted using large-scale annotated datasets for histopathological image analysis. However, in several scenarios, the availability of large-scale annotated data is a bottleneck while training such models. Self-supervised learning (SSL) is an alternative paradigm that provides some respite by constructing models utilizing only the unannotated data which is often abundant. The basic idea of SSL is to train a network to perform one or many pseudo or pretext tasks on unannotated data and use it subsequently as the basis for a variety of downstream tasks. It is seen that the success of SSL depends critically on the considered pretext task. While there have been many efforts in designing pretext tasks for classification problems, there have not been many attempts on SSL for histopathological image segmentation. Motivated by this, we propose an SSL approach for segmenting histopathological images via generative diffusion models. Our method is based on the observation that diffusion models effectively solve an image-to-image translation task akin to a segmentation task. Hence, we propose generative diffusion as the pretext task for histopathological image segmentation. We also utilize a multi-loss function-based fine-tuning for the downstream task. We validate our method using several metrics on two publicly available datasets along with a newly proposed head and neck (HN) cancer dataset containing Hematoxylin and Eosin (H&E) stained images along with annotations.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuegang Song, Kaixiang Shu, Peng Yang, Cheng Zhao, Feng Zhou, Alejandro F Frangi, Xiaohua Xiao, Lei Dong, Tianfu Wang, Shuqiang Wang, Baiying Lei
{"title":"Knowledge-aware Multisite Adaptive Graph Transformer for Brain Disorder Diagnosis.","authors":"Xuegang Song, Kaixiang Shu, Peng Yang, Cheng Zhao, Feng Zhou, Alejandro F Frangi, Xiaohua Xiao, Lei Dong, Tianfu Wang, Shuqiang Wang, Baiying Lei","doi":"10.1109/TMI.2024.3453419","DOIUrl":"https://doi.org/10.1109/TMI.2024.3453419","url":null,"abstract":"<p><p>Brain disorder diagnosis via resting-state functional magnetic resonance imaging (rs-fMRI) is usually limited due to the complex imaging features and sample size. For brain disorder diagnosis, the graph convolutional network (GCN) has achieved remarkable success by capturing interactions between individuals and the population. However, there are mainly three limitations: 1) The previous GCN approaches consider the non-imaging information in edge construction but ignore the sensitivity differences of features to non-imaging information. 2) The previous GCN approaches solely focus on establishing interactions between subjects (i.e., individuals and the population), disregarding the essential relationship between features. 3) Multisite data increase the sample size to help classifier training, but the inter-site heterogeneity limits the performance to some extent. This paper proposes a knowledge-aware multisite adaptive graph Transformer to address the above problems. First, we evaluate the sensitivity of features to each piece of non-imaging information, and then construct feature-sensitive and feature-insensitive subgraphs. Second, after fusing the above subgraphs, we integrate a Transformer module to capture the intrinsic relationship between features. Third, we design a domain adaptive GCN using multiple loss function terms to relieve data heterogeneity and to produce the final classification results. Last, the proposed framework is validated on two brain disorder diagnostic tasks. Experimental results show that the proposed framework can achieve state-of-the-art performance.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rui Feng, Jingwen Yang, Hao Huang, Zelin Chen, Ruiyan Feng, N U Farrukh Hameed, Xudong Zhang, Jie Hu, Liang Chen, Shuo Lu
{"title":"Spatiotemporal Microstate Dynamics of Spike-free Scalp EEG Offer a Potential Biomarker for Refractory Temporal Lobe Epilepsy.","authors":"Rui Feng, Jingwen Yang, Hao Huang, Zelin Chen, Ruiyan Feng, N U Farrukh Hameed, Xudong Zhang, Jie Hu, Liang Chen, Shuo Lu","doi":"10.1109/TMI.2024.3453377","DOIUrl":"https://doi.org/10.1109/TMI.2024.3453377","url":null,"abstract":"<p><p>Refractory temporal lobe epilepsy (TLE) is one of the most frequently observed subtypes of epilepsy and endangers more than 50 million people world-wide. Although electroencephalogram (EEG) had been widely recognized as a classic tool to screen and diagnose epilepsy, for many years it heavily relied on identifying epileptic discharges and epileptogenic zone localization, which however, limits the understanding of refractory epilepsy due to the network nature of this disease. This work hypothesizes that the microstate dynamics based on resting-state scalp EEG can offer an additional network depiction of the disease and provide potential complementary evaluation tool for the TLE even without detectable epileptic discharges on EEG. We propose a novel framework for EEG microstate spatial-temporal dynamics (EEG-MiSTD) analysis based on machine learning to comprehensively model millisecond-changing whole-brain network dynamics. With only 100 seconds of resting-state EEG even without epileptic discharges, this approach successfully distinguishes TLE patients from healthy controls and is related to the lateralization of epileptic focus. Besides, microstate temporal and spatial features are found to be widely related to clinical parameters, which further demonstrate that TLE is a network disease. A preliminary exploration suggests that the spatial topography is sensitive to the following surgical outcomes. From such a new perspective, our results suggest that spatiotemporal microstate dynamics is potentially a biomarker of the disease. The developed EEG-MiSTD framework can probably be considered as a general tool to examine dynamical brain network disruption in a user-friendly way for other types of epilepsy.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Modal Federated Learning for Cancer Staging over Non-IID Datasets with Unbalanced Modalities.","authors":"Kasra Borazjani, Naji Khosravan, Leslie Ying, Seyyedali Hosseinalipour","doi":"10.1109/TMI.2024.3450855","DOIUrl":"https://doi.org/10.1109/TMI.2024.3450855","url":null,"abstract":"<p><p>The use of machine learning (ML) for cancer staging through medical image analysis has gained substantial interest across medical disciplines. When accompanied by the innovative federated learning (FL) framework, ML techniques can further overcome privacy concerns related to patient data exposure. Given the frequent presence of diverse data modalities within patient records, leveraging FL in a multi-modal learning framework holds considerable promise for cancer staging. However, existing works on multi-modal FL often presume that all data-collecting institutions have access to all data modalities. This oversimplified approach neglects institutions that have access to only a portion of data modalities within the system. In this work, we introduce a novel FL architecture designed to accommodate not only the heterogeneity of data samples, but also the inherent heterogeneity/non-uniformity of data modalities across institutions. We shed light on the challenges associated with varying convergence speeds observed across different data modalities within our FL system. Subsequently, we propose a solution to tackle these challenges by devising a distributed gradient blending and proximity-aware client weighting strategy tailored for multi-modal FL. To show the superiority of our method, we conduct experiments using The Cancer Genome Atlas program (TCGA) datalake considering different cancer types and three modalities of data: mRNA sequences, histopathological image data, and clinical information. Our results further unveil the impact and severity of class-based vs type-based heterogeneity across institutions on the model performance, which widens the perspective to the notion of data heterogeneity in multi-modal FL literature.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142086428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sen Wang, Yirong Yang, Grant M Stevens, Zhye Yin, Adam S Wang
{"title":"Emulating Low-Dose PCCT Image Pairs with Independent Noise for Self-Supervised Spectral Image Denoising.","authors":"Sen Wang, Yirong Yang, Grant M Stevens, Zhye Yin, Adam S Wang","doi":"10.1109/TMI.2024.3449817","DOIUrl":"https://doi.org/10.1109/TMI.2024.3449817","url":null,"abstract":"<p><p>Photon counting CT (PCCT) acquires spectral measurements and enables generation of material decomposition (MD) images that provide distinct advantages in various clinical situations. However, noise amplification is observed in MD images, and denoising is typically applied. Clean or high-quality references are rare in clinical scans, often making supervised learning (Noise2Clean) impractical. Noise2Noise is a self-supervised counterpart, using noisy images and corresponding noisy references with zero-mean, independent noise. PCCT counts transmitted photons separately, and raw measurements are assumed to follow a Poisson distribution in each energy bin, providing the possibility to create noise-independent pairs. The approach is to use binomial selection to split the counts into two low-dose scans with independent noise. We prove that the reconstructed spectral images inherit the noise independence from counts domain through noise propagation analysis and also validated it in numerical simulation and experimental phantom scans. The method offers the flexibility to split measurements into desired dose levels while ensuring the reconstructed images share identical underlying features, thereby strengthening the model's robustness for input dose levels and capability of preserving fine details. In both numerical simulation and experimental phantom scans, we demonstrated that Noise2Noise with binomial selection outperforms other common self-supervised learning methods based on different presumptive conditions.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142086427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training.","authors":"Che Liu, Sibo Cheng, Miaojing Shi, Anand Shah, Wenjia Bai, Rossella Arcucci","doi":"10.1109/TMI.2024.3449690","DOIUrl":"https://doi.org/10.1109/TMI.2024.3449690","url":null,"abstract":"<p><p>In the field of medical Vision-Language Pretraining (VLP), significant efforts have been devoted to deriving text and image features from both clinical reports and associated medical images. However, most existing methods may have overlooked the opportunity in leveraging the inherent hierarchical structure of clinical reports, which are generally split into 'findings' for descriptive content and 'impressions' for conclusive observation. Instead of utilizing this rich, structured format, current medical VLP approaches often simplify the report into either a unified entity or fragmented tokens. In this work, we propose a novel clinical prior guided VLP framework named IMITATE to learn the structure information from medical reports with hierarchical vision-language alignment. The framework derives multi-level visual features from the chest X-ray (CXR) images and separately aligns these features with the descriptive and the conclusive text encoded in the hierarchical medical report. Furthermore, a new clinical-informed contrastive loss is introduced for cross-modal learning, which accounts for clinical prior knowledge in formulating sample correlations in contrastive learning. The proposed model, IMITATE, outperforms baseline VLP methods across six different datasets, spanning five medical imaging downstream tasks. Comprehensive experimental results highlight the advantages of integrating the hierarchical structure of medical reports for vision-language alignment.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142074850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sunggu Kyung, Jongjun Won, Seongyong Pak, Sunwoo Kim, Sangyoon Lee, Kanggil Park, Gil-Sun Hong, Namkug Kim
{"title":"Generative Adversarial Network with Robust Discriminator Through Multi-Task Learning for Low-Dose CT Denoising.","authors":"Sunggu Kyung, Jongjun Won, Seongyong Pak, Sunwoo Kim, Sangyoon Lee, Kanggil Park, Gil-Sun Hong, Namkug Kim","doi":"10.1109/TMI.2024.3449647","DOIUrl":"https://doi.org/10.1109/TMI.2024.3449647","url":null,"abstract":"<p><p>Reducing the dose of radiation in computed tomography (CT) is vital to decreasing secondary cancer risk. However, the use of low-dose CT (LDCT) images is accompanied by increased noise that can negatively impact diagnoses. Although numerous deep learning algorithms have been developed for LDCT denoising, several challenges persist, including the visual incongruence experienced by radiologists, unsatisfactory performances across various metrics, and insufficient exploration of the networks' robustness in other CT domains. To address such issues, this study proposes three novel accretions. First, we propose a generative adversarial network (GAN) with a robust discriminator through multi-task learning that simultaneously performs three vision tasks: restoration, image-level, and pixel-level decisions. The more multi-tasks that are performed, the better the denoising performance of the generator, which means multi-task learning enables the discriminator to provide more meaningful feedback to the generator. Second, two regulatory mechanisms, restoration consistency (RC) and non-difference suppression (NDS), are introduced to improve the discriminator's representation capabilities. These mechanisms eliminate irrelevant regions and compare the discriminator's results from the input and restoration, thus facilitating effective GAN training. Lastly, we incorporate residual fast Fourier transforms with convolution (Res-FFT-Conv) blocks into the generator to utilize both frequency and spatial representations. This approach provides mixed receptive fields by using spatial (or local), spectral (or global), and residual connections. Our model was evaluated using various pixel- and feature-space metrics in two denoising tasks. Additionally, we conducted visual scoring with radiologists. The results indicate superior performance in both quantitative and qualitative measures compared to state-of-the-art denoising techniques.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142074849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BCNet: Bronchus Classification via Structure Guided Representation Learning.","authors":"Wenhao Huang, Haifan Gong, Huan Zhang, Yu Wang, Xiang Wan, Guanbin Li, Haofeng Li, Hong Shen","doi":"10.1109/TMI.2024.3448468","DOIUrl":"https://doi.org/10.1109/TMI.2024.3448468","url":null,"abstract":"<p><p>CT-based bronchial tree analysis is a key step for the diagnosis of lung and airway diseases. However, the topology of bronchial trees varies across individuals, which presents a challenge to the automatic bronchus classification. To solve this issue, we propose the Bronchus Classification Network (BCNet), a structure-guided framework that exploits the segment-level topological information using point clouds to learn the voxel-level features. BCNet has two branches, a Point-Voxel Graph Neural Network (PV-GNN) for segment classification, and a Convolutional Neural Network (CNN) for voxel labeling. The two branches are simultaneously trained to learn topology-aware features for their shared backbone while it is feasible to run only the CNN branch for the inference. Therefore, BCNet maintains the same inference efficiency as its CNN baseline. Experimental results show that BCNet significantly exceeds the state-of-the-art methods by over 8.0% both on F1-score for classifying bronchus. Furthermore, we contribute BronAtlas: an open-access benchmark of bronchus imaging analysis with high-quality voxel-wise annotations of both anatomical and abnormal bronchial segments. The benchmark is available at link<sup>1</sup>.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142044270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kunming Tang, Zhiguo Jiang, Kun Wu, Jun Shi, Fengying Xie, Wei Wang, Haibo Wu, Yushan Zheng
{"title":"Self-Supervised Representation Distribution Learning for Reliable Data Augmentation in Histopathology WSI Classification.","authors":"Kunming Tang, Zhiguo Jiang, Kun Wu, Jun Shi, Fengying Xie, Wei Wang, Haibo Wu, Yushan Zheng","doi":"10.1109/TMI.2024.3447672","DOIUrl":"https://doi.org/10.1109/TMI.2024.3447672","url":null,"abstract":"<p><p>Multiple instance learning (MIL) based whole slide image (WSI) classification is often carried out on the representations of patches extracted from WSI with a pre-trained patch encoder. The performance of classification relies on both patch-level representation learning and MIL classifier training. Most MIL methods utilize a frozen model pre-trained on ImageNet or a model trained with self-supervised learning on histopathology image dataset to extract patch image representations and then fix these representations in the training of the MIL classifiers for efficiency consideration. However, the invariance of representations cannot meet the diversity requirement for training a robust MIL classifier, which has significantly limited the performance of the WSI classification. In this paper, we propose a Self-Supervised Representation Distribution Learning framework (SSRDL) for patch-level representation learning with an online representation sampling strategy (ORS) for both patch feature extraction and WSI-level data augmentation. The proposed method was evaluated on three datasets under three MIL frameworks. The experimental results have demonstrated that the proposed method achieves the best performance in histopathology image representation learning and data augmentation and outperforms state-of-the-art methods under different WSI classification frameworks. The code is available at https://github.com/lazytkm/SSRDL.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142037962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}