{"title":"Efficient anatomical labeling of pulmonary tree structures via deep point-graph representation-based implicit fields","authors":"Kangxian Xie , Jiancheng Yang , Donglai Wei , Ziqiao Weng , Pascal Fua","doi":"10.1016/j.media.2024.103367","DOIUrl":"10.1016/j.media.2024.103367","url":null,"abstract":"<div><div>Pulmonary diseases rank prominently among the principal causes of death worldwide. Curing them will require, among other things, a better understanding of the complex 3D tree-shaped structures within the pulmonary system, such as airways, arteries, and veins. Traditional approaches using high-resolution image stacks and standard CNNs on dense voxel grids face challenges in computational efficiency, limited resolution, local context, and inadequate preservation of shape topology. Our method addresses these issues by shifting from dense voxel to sparse point representation, offering better memory efficiency and global context utilization. However, the inherent sparsity in point representation can lead to a loss of crucial connectivity in tree-shaped structures. To mitigate this, we introduce graph learning on skeletonized structures, incorporating differentiable feature fusion for improved topology and long-distance context capture. Furthermore, we employ an implicit function for efficient conversion of sparse representations into dense reconstructions end-to-end. The proposed method not only delivers state-of-the-art performance in labeling accuracy, both overall and at key locations, but also enables efficient inference and the generation of closed surface shapes. Addressing data scarcity in this field, we have also curated a comprehensive dataset to validate our approach. Data and code are available at <span><span>https://github.com/M3DV/pulmonary-tree-labeling</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103367"},"PeriodicalIF":10.7,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142503532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
João D. Nunes , Diana Montezuma , Domingos Oliveira , Tania Pereira , Jaime S. Cardoso
{"title":"A survey on cell nuclei instance segmentation and classification: Leveraging context and attention","authors":"João D. Nunes , Diana Montezuma , Domingos Oliveira , Tania Pereira , Jaime S. Cardoso","doi":"10.1016/j.media.2024.103360","DOIUrl":"10.1016/j.media.2024.103360","url":null,"abstract":"<div><div>Nuclear-derived morphological features and biomarkers provide relevant insights regarding the tumour microenvironment, while also allowing diagnosis and prognosis in specific cancer types. However, manually annotating nuclei from the gigapixel Haematoxylin and Eosin (H&E)-stained Whole Slide Images (WSIs) is a laborious and costly task, meaning automated algorithms for cell nuclei instance segmentation and classification could alleviate the workload of pathologists and clinical researchers and at the same time facilitate the automatic extraction of clinically interpretable features for artificial intelligence (AI) tools. But due to high intra- and inter-class variability of nuclei morphological and chromatic features, as well as H&E-stains susceptibility to artefacts, state-of-the-art algorithms cannot correctly detect and classify instances with the necessary performance. In this work, we hypothesize context and attention inductive biases in artificial neural networks (ANNs) could increase the performance and generalization of algorithms for cell nuclei instance segmentation and classification. To understand the advantages, use-cases, and limitations of context and attention-based mechanisms in instance segmentation and classification, we start by reviewing works in computer vision and medical imaging. We then conduct a thorough survey on context and attention methods for cell nuclei instance segmentation and classification from H&E-stained microscopy imaging, while providing a comprehensive discussion of the challenges being tackled with context and attention. Besides, we illustrate some limitations of current approaches and present ideas for future research. As a case study, we extend both a general (Mask-RCNN) and a customized (HoVer-Net) instance segmentation and classification methods with context- and attention-based mechanisms and perform a comparative analysis on a multicentre dataset for colon nuclei identification and counting.</div><div>Although pathologists rely on context at multiple levels while paying attention to specific Regions of Interest (RoIs) when analysing and annotating WSIs, our findings suggest translating that domain knowledge into algorithm design is no trivial task, but to fully exploit these mechanisms in ANNs, the scientific understanding of these methods should first be addressed.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103360"},"PeriodicalIF":10.7,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142391704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Liu , Maxence Boels , Luis C. Garcia-Peraza-Herrera , Tom Vercauteren , Prokar Dasgupta , Alejandro Granados , Sébastien Ourselin
{"title":"LoViT: Long Video Transformer for surgical phase recognition","authors":"Yang Liu , Maxence Boels , Luis C. Garcia-Peraza-Herrera , Tom Vercauteren , Prokar Dasgupta , Alejandro Granados , Sébastien Ourselin","doi":"10.1016/j.media.2024.103366","DOIUrl":"10.1016/j.media.2024.103366","url":null,"abstract":"<div><div>Online surgical phase recognition plays a significant role towards building contextual tools that could quantify performance and oversee the execution of surgical workflows. Current approaches are limited since they train spatial feature extractors using frame-level supervision that could lead to incorrect predictions due to similar frames appearing at different phases, and poorly fuse local and global features due to computational constraints which can affect the analysis of long videos commonly encountered in surgical interventions. In this paper, we present a two-stage method, called Long Video Transformer (LoViT), emphasizing the development of a temporally-rich spatial feature extractor and a phase transition map. The temporally-rich spatial feature extractor is designed to capture critical temporal information within the surgical video frames. The phase transition map provides essential insights into the dynamic transitions between different surgical phases. LoViT combines these innovations with a multiscale temporal aggregator consisting of two cascaded L-Trans modules based on self-attention, followed by a G-Informer module based on <em>ProbSparse</em> self-attention for processing global temporal information. The multi-scale temporal head then leverages the temporally-rich spatial features and phase transition map to classify surgical phases using phase transition-aware supervision. Our approach outperforms state-of-the-art methods on the Cholec80 and AutoLaparo datasets consistently. Compared to Trans-SVNet, LoViT achieves a 2.4 pp (percentage point) improvement in video-level accuracy on Cholec80 and a 3.1 pp improvement on AutoLaparo. Our results demonstrate the effectiveness of our approach in achieving state-of-the-art performance of surgical phase recognition on two datasets of different surgical procedures and temporal sequencing characteristics. The project page is available at <span><span>https://github.com/MRUIL/LoViT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103366"},"PeriodicalIF":10.7,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142442817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Julio Silva-Rodríguez , Hadi Chakor , Riadh Kobbi , Jose Dolz , Ismail Ben Ayed
{"title":"A Foundation Language-Image Model of the Retina (FLAIR): encoding expert knowledge in text supervision","authors":"Julio Silva-Rodríguez , Hadi Chakor , Riadh Kobbi , Jose Dolz , Ismail Ben Ayed","doi":"10.1016/j.media.2024.103357","DOIUrl":"10.1016/j.media.2024.103357","url":null,"abstract":"<div><div>Foundation vision-language models are currently transforming computer vision, and are on the rise in medical imaging fueled by their very promising generalization capabilities. However, the initial attempts to transfer this new paradigm to medical imaging have shown less impressive performances than those observed in other domains, due to the significant domain shift and the complex, expert domain knowledge inherent to medical-imaging tasks. Motivated by the need for domain-expert foundation models, we present FLAIR, a pre-trained vision-language model for universal retinal fundus image understanding. To this end, we compiled 38 open-access, mostly categorical fundus imaging datasets from various sources, with up to 101 different target conditions and 288,307 images. We integrate the expert’s domain knowledge in the form of descriptive textual prompts, during both pre-training and zero-shot inference, enhancing the less-informative categorical supervision of the data. Such a textual expert’s knowledge, which we compiled from the relevant clinical literature and community standards, describes the fine-grained features of the pathologies as well as the hierarchies and dependencies between them. We report comprehensive evaluations, which illustrate the benefit of integrating expert knowledge and the strong generalization capabilities of FLAIR under difficult scenarios with domain shifts or unseen categories. When adapted with a lightweight linear probe, FLAIR outperforms fully-trained, dataset-focused models, more so in the few-shot regimes. Interestingly, FLAIR outperforms by a wide margin larger-scale generalist image-language models and retina domain-specific self-supervised networks, which emphasizes the potential of embedding experts’ domain knowledge and the limitations of generalist models in medical imaging. The pre-trained model is available at: <span><span>https://github.com/jusiro/FLAIR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103357"},"PeriodicalIF":10.7,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142442816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Negar Golestani , Aihui Wang , Golnaz Moallem , Gregory R. Bean , Mirabela Rusu
{"title":"PViT-AIR: Puzzling vision transformer-based affine image registration for multi histopathology and faxitron images of breast tissue","authors":"Negar Golestani , Aihui Wang , Golnaz Moallem , Gregory R. Bean , Mirabela Rusu","doi":"10.1016/j.media.2024.103356","DOIUrl":"10.1016/j.media.2024.103356","url":null,"abstract":"<div><div>Breast cancer is a significant global public health concern, with various treatment options available based on tumor characteristics. Pathological examination of excision specimens after surgery provides essential information for treatment decisions. However, the manual selection of representative sections for histological examination is laborious and subjective, leading to potential sampling errors and variability, especially in carcinomas that have been previously treated with chemotherapy. Furthermore, the accurate identification of residual tumors presents significant challenges, emphasizing the need for systematic or assisted methods to address this issue. In order to enable the development of deep-learning algorithms for automated cancer detection on radiology images, it is crucial to perform radiology-pathology registration, which ensures the generation of accurately labeled ground truth data. The alignment of radiology and histopathology images plays a critical role in establishing reliable cancer labels for training deep-learning algorithms on radiology images. However, aligning these images is challenging due to their content and resolution differences, tissue deformation, artifacts, and imprecise correspondence. We present a novel deep learning-based pipeline for the affine registration of faxitron images, the x-ray representations of macrosections of ex-vivo breast tissue, and their corresponding histopathology images of tissue segments. The proposed model combines convolutional neural networks and vision transformers, allowing it to effectively capture both local and global information from the entire tissue macrosection as well as its segments. This integrated approach enables simultaneous registration and stitching of image segments, facilitating segment-to-macrosection registration through a puzzling-based mechanism. To address the limitations of multi-modal ground truth data, we tackle the problem by training the model using synthetic mono-modal data in a weakly supervised manner. The trained model demonstrated successful performance in multi-modal registration, yielding registration results with an average landmark error of 1.51 mm <span><math><mrow><mo>(</mo><mo>±</mo><mn>2</mn><mo>.</mo><mn>40</mn><mo>)</mo></mrow></math></span>, and stitching distance of 1.15 mm <span><math><mrow><mo>(</mo><mo>±</mo><mn>0</mn><mo>.</mo><mn>94</mn><mo>)</mo></mrow></math></span>. The results indicate that the model performs significantly better than existing baselines, including both deep learning-based and iterative models, and it is also approximately 200 times faster than the iterative approach. This work bridges the gap in the current research and clinical workflow and has the potential to improve efficiency and accuracy in breast cancer evaluation and streamline pathology workflow.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103356"},"PeriodicalIF":10.7,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142391706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenxuan Chen , Sirui Wu , Shuai Wang , Zhongsen Li , Jia Yang , Huifeng Yao , Qiyuan Tian , Xiaolei Song
{"title":"Multi-contrast image super-resolution with deformable attention and neighborhood-based feature aggregation (DANCE): Applications in anatomic and metabolic MRI","authors":"Wenxuan Chen , Sirui Wu , Shuai Wang , Zhongsen Li , Jia Yang , Huifeng Yao , Qiyuan Tian , Xiaolei Song","doi":"10.1016/j.media.2024.103359","DOIUrl":"10.1016/j.media.2024.103359","url":null,"abstract":"<div><div>Multi-contrast magnetic resonance imaging (MRI) reflects information about human tissues from different perspectives and has wide clinical applications. By utilizing the auxiliary information from reference images (Refs) in the easy-to-obtain modality, multi-contrast MRI super-resolution (SR) methods can synthesize high-resolution (HR) images from their low-resolution (LR) counterparts in the hard-to-obtain modality. In this study, we systematically discussed the potential impacts caused by cross-modal misalignments between LRs and Refs and, based on this discussion, proposed a novel deep-learning-based method with <strong>D</strong>eformable <strong>A</strong>ttention and <strong>N</strong>eighborhood-based feature aggregation to be <strong>C</strong>omputationally <strong>E</strong>fficient (DANCE) and insensitive to misalignments. Our method has been evaluated in two public MRI datasets, i.e., IXI and FastMRI, and an in-house MR metabolic imaging dataset with amide proton transfer weighted (APTW) images. Experimental results reveal that our method consistently outperforms baselines in various scenarios, with significant superiority observed in the misaligned group of IXI dataset and the prospective study of the clinical dataset. The robustness study proves that our method is insensitive to misalignments, maintaining an average PSNR of 30.67 dB when faced with a maximum range of ±9°and ±9 pixels of rotation and translation on Refs. Given our method’s desirable comprehensive performance, good robustness, and moderate computational complexity, it possesses substantial potential for clinical applications.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103359"},"PeriodicalIF":10.7,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142391705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siyuan Dong , Zhuotong Cai , Gilbert Hangel , Wolfgang Bogner , Georg Widhalm , Yaqing Huang , Qinghao Liang , Chenyu You , Chathura Kumaragamage , Robert K. Fulbright , Amit Mahajan , Amin Karbasi , John A. Onofrey , Robin A. de Graaf , James S. Duncan
{"title":"A Flow-based Truncated Denoising Diffusion Model for super-resolution Magnetic Resonance Spectroscopic Imaging","authors":"Siyuan Dong , Zhuotong Cai , Gilbert Hangel , Wolfgang Bogner , Georg Widhalm , Yaqing Huang , Qinghao Liang , Chenyu You , Chathura Kumaragamage , Robert K. Fulbright , Amit Mahajan , Amin Karbasi , John A. Onofrey , Robin A. de Graaf , James S. Duncan","doi":"10.1016/j.media.2024.103358","DOIUrl":"10.1016/j.media.2024.103358","url":null,"abstract":"<div><div>Magnetic Resonance Spectroscopic Imaging (MRSI) is a non-invasive imaging technique for studying metabolism and has become a crucial tool for understanding neurological diseases, cancers and diabetes. High spatial resolution MRSI is needed to characterize lesions, but in practice MRSI is acquired at low resolution due to time and sensitivity restrictions caused by the low metabolite concentrations. Therefore, there is an imperative need for a post-processing approach to generate high-resolution MRSI from low-resolution data that can be acquired fast and with high sensitivity. Deep learning-based super-resolution methods provided promising results for improving the spatial resolution of MRSI, but they still have limited capability to generate accurate and high-quality images. Recently, diffusion models have demonstrated superior learning capability than other generative models in various tasks, but sampling from diffusion models requires iterating through a large number of diffusion steps, which is time-consuming. This work introduces a Flow-based Truncated Denoising Diffusion Model (FTDDM) for super-resolution MRSI, which shortens the diffusion process by truncating the diffusion chain, and the truncated steps are estimated using a normalizing flow-based network. The network is conditioned on upscaling factors to enable multi-scale super-resolution. To train and evaluate the deep learning models, we developed a <sup>1</sup>H-MRSI dataset acquired from 25 high-grade glioma patients. We demonstrate that FTDDM outperforms existing generative models while speeding up the sampling process by over 9-fold compared to the baseline diffusion model. Neuroradiologists’ evaluations confirmed the clinical advantages of our method, which also supports uncertainty estimation and sharpness adjustment, extending its potential clinical applications.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103358"},"PeriodicalIF":10.7,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuai Chen , Antonio Garcia-Uceda , Jiahang Su , Gijs van Tulder , Lennard Wolff , Theo van Walsum , Marleen de Bruijne
{"title":"Label refinement network from synthetic error augmentation for medical image segmentation","authors":"Shuai Chen , Antonio Garcia-Uceda , Jiahang Su , Gijs van Tulder , Lennard Wolff , Theo van Walsum , Marleen de Bruijne","doi":"10.1016/j.media.2024.103355","DOIUrl":"10.1016/j.media.2024.103355","url":null,"abstract":"<div><div>Deep convolutional neural networks for image segmentation do not learn the label structure explicitly and may produce segmentations with an incorrect structure, e.g., with disconnected cylindrical structures in the segmentation of tree-like structures such as airways or blood vessels. In this paper, we propose a novel label refinement method to correct such errors from an initial segmentation, implicitly incorporating information about label structure. This method features two novel parts: (1) a model that generates synthetic structural errors, and (2) a label appearance simulation network that produces segmentations with synthetic errors that are similar in appearance to the real initial segmentations. Using these segmentations with synthetic errors and the original images, the label refinement network is trained to correct errors and improve the initial segmentations. The proposed method is validated on two segmentation tasks: airway segmentation from chest computed tomography (CT) scans and brain vessel segmentation from 3D CT angiography (CTA) images of the brain. In both applications, our method significantly outperformed a standard 3D U-Net, four previous label refinement methods, and a U-Net trained with a loss tailored for tubular structures. Improvements are even larger when additional unlabeled data is used for model training. In an ablation study, we demonstrate the value of the different components of the proposed method.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103355"},"PeriodicalIF":10.7,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142378057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hengjie Liu , Elizabeth McKenzie , Di Xu , Qifan Xu , Robert K. Chin , Dan Ruan , Ke Sheng
{"title":"MUsculo-Skeleton-Aware (MUSA) deep learning for anatomically guided head-and-neck CT deformable registration","authors":"Hengjie Liu , Elizabeth McKenzie , Di Xu , Qifan Xu , Robert K. Chin , Dan Ruan , Ke Sheng","doi":"10.1016/j.media.2024.103351","DOIUrl":"10.1016/j.media.2024.103351","url":null,"abstract":"<div><div>Deep-learning-based deformable image registration (DL-DIR) has demonstrated improved accuracy compared to time-consuming non-DL methods across various anatomical sites. However, DL-DIR is still challenging in heterogeneous tissue regions with large deformation. In fact, several state-of-the-art DL-DIR methods fail to capture the large, anatomically plausible deformation when tested on head-and-neck computed tomography (CT) images. These results allude to the possibility that such complex head-and-neck deformation may be beyond the capacity of a single network structure or a homogeneous smoothness regularization. To address the challenge of combined multi-scale musculoskeletal motion and soft tissue deformation in the head-and-neck region, we propose a MUsculo-Skeleton-Aware (MUSA) framework to anatomically guide DL-DIR by leveraging the explicit multiresolution strategy and the inhomogeneous deformation constraints between the bony structures and soft tissue. The proposed method decomposes the complex deformation into a bulk posture change and residual fine deformation. It can accommodate both inter- and intra- subject registration. Our results show that the MUSA framework can consistently improve registration accuracy and, more importantly, the plausibility of deformation for various network architectures. The code will be publicly available at <span><span>https://github.com/HengjieLiu/DIR-MUSA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103351"},"PeriodicalIF":10.7,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142400703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lijun An , Chen Zhang , Naren Wulan , Shaoshi Zhang , Pansheng Chen , Fang Ji , Kwun Kei Ng , Christopher Chen , Juan Helen Zhou , B.T. Thomas Yeo , Alzheimer's Disease Neuroimaging InitiativeAustralian Imaging Biomarkers and Lifestyle Study of Aging
{"title":"DeepResBat: Deep residual batch harmonization accounting for covariate distribution differences","authors":"Lijun An , Chen Zhang , Naren Wulan , Shaoshi Zhang , Pansheng Chen , Fang Ji , Kwun Kei Ng , Christopher Chen , Juan Helen Zhou , B.T. Thomas Yeo , Alzheimer's Disease Neuroimaging InitiativeAustralian Imaging Biomarkers and Lifestyle Study of Aging","doi":"10.1016/j.media.2024.103354","DOIUrl":"10.1016/j.media.2024.103354","url":null,"abstract":"<div><div>Pooling MRI data from multiple datasets requires harmonization to reduce undesired inter-site variabilities, while preserving effects of biological variables (or covariates). The popular harmonization approach ComBat uses a mixed effect regression framework that explicitly accounts for covariate distribution differences across datasets. There is also significant interest in developing harmonization approaches based on deep neural networks (DNNs), such as conditional variational autoencoder (cVAE). However, current DNN approaches do not explicitly account for covariate distribution differences across datasets. Here, we provide mathematical results, suggesting that not accounting for covariates can lead to suboptimal harmonization. We propose two DNN-based covariate-aware harmonization approaches: covariate VAE (coVAE) and DeepResBat. The coVAE approach is a natural extension of cVAE by concatenating covariates and site information with site- and covariate-invariant latent representations. DeepResBat adopts a residual framework inspired by ComBat. DeepResBat first removes the effects of covariates with nonlinear regression trees, followed by eliminating site differences with cVAE. Finally, covariate effects are added back to the harmonized residuals. Using three datasets from three continents with a total of 2787 participants and 10,085 anatomical T1 scans, we find that DeepResBat and coVAE outperformed ComBat, CovBat and cVAE in terms of removing dataset differences, while enhancing biological effects of interest. However, coVAE hallucinates spurious associations between anatomical MRI and covariates even when no association exists. Future studies proposing DNN-based harmonization approaches should be aware of this false positive pitfall. Overall, our results suggest that DeepResBat is an effective deep learning alternative to ComBat. Code for DeepResBat can be found here: <span><span>https://github.com/ThomasYeoLab/CBIG/tree/master/stable_projects/harmonization/An2024_DeepResBat</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103354"},"PeriodicalIF":10.7,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142378046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}