Xin Gao , Meihui Zhang , Junjie Li , Shanbo Zhao , Zhizheng Zhuo , Liying Qu , Jinyuan Weng , Li Chai , Yunyun Duan , Chuyang Ye , Yaou Liu
{"title":"Anatomy-guided slice-description interaction for multimodal brain disease diagnosis based on 3D image and radiological report","authors":"Xin Gao , Meihui Zhang , Junjie Li , Shanbo Zhao , Zhizheng Zhuo , Liying Qu , Jinyuan Weng , Li Chai , Yunyun Duan , Chuyang Ye , Yaou Liu","doi":"10.1016/j.compmedimag.2025.102556","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate brain disease diagnosis based on radiological images is desired in clinical practice as it can facilitate early intervention and reduce the risk of damage. However, existing unimodal image-based models struggle to process high-dimensional 3D brain imaging data effectively. Multimodal disease diagnosis approaches based on medical images and corresponding radiological reports achieved promising progress with the development of vision-language models. However, most multimodal methods handle 2D images and cannot be directly applied to brain disease diagnosis that uses 3D images. Therefore, in this work we develop a multimodal brain disease diagnosis model that takes 3D brain images and their radiological reports as input. Motivated by the fact that radiologists scroll through image slices and write important descriptions into the report accordingly, we propose a slice-description cross-modality interaction mechanism to realize fine-grained multimodal data interaction. Moreover, since previous medical research has demonstrated potential correlation between anatomical location of anomalies and diagnosis results, we further explore the use of brain anatomical prior knowledge to improve the multimodal interaction. Based on the report description, the prior knowledge filters the image information by suppressing irrelevant regions and enhancing relevant slices. Our method was validated with two brain disease diagnosis tasks. The results indicate that our model outperforms competing unimodal and multimodal methods for brain disease diagnosis. In particular, it has yielded an average accuracy improvement of 15.87% and 7.39% compared with the image-based and multimodal competing methods, respectively.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"123 ","pages":"Article 102556"},"PeriodicalIF":5.4000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computerized Medical Imaging and Graphics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0895611125000655","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate brain disease diagnosis based on radiological images is desired in clinical practice as it can facilitate early intervention and reduce the risk of damage. However, existing unimodal image-based models struggle to process high-dimensional 3D brain imaging data effectively. Multimodal disease diagnosis approaches based on medical images and corresponding radiological reports achieved promising progress with the development of vision-language models. However, most multimodal methods handle 2D images and cannot be directly applied to brain disease diagnosis that uses 3D images. Therefore, in this work we develop a multimodal brain disease diagnosis model that takes 3D brain images and their radiological reports as input. Motivated by the fact that radiologists scroll through image slices and write important descriptions into the report accordingly, we propose a slice-description cross-modality interaction mechanism to realize fine-grained multimodal data interaction. Moreover, since previous medical research has demonstrated potential correlation between anatomical location of anomalies and diagnosis results, we further explore the use of brain anatomical prior knowledge to improve the multimodal interaction. Based on the report description, the prior knowledge filters the image information by suppressing irrelevant regions and enhancing relevant slices. Our method was validated with two brain disease diagnosis tasks. The results indicate that our model outperforms competing unimodal and multimodal methods for brain disease diagnosis. In particular, it has yielded an average accuracy improvement of 15.87% and 7.39% compared with the image-based and multimodal competing methods, respectively.
期刊介绍:
The purpose of the journal Computerized Medical Imaging and Graphics is to act as a source for the exchange of research results concerning algorithmic advances, development, and application of digital imaging in disease detection, diagnosis, intervention, prevention, precision medicine, and population health. Included in the journal will be articles on novel computerized imaging or visualization techniques, including artificial intelligence and machine learning, augmented reality for surgical planning and guidance, big biomedical data visualization, computer-aided diagnosis, computerized-robotic surgery, image-guided therapy, imaging scanning and reconstruction, mobile and tele-imaging, radiomics, and imaging integration and modeling with other information relevant to digital health. The types of biomedical imaging include: magnetic resonance, computed tomography, ultrasound, nuclear medicine, X-ray, microwave, optical and multi-photon microscopy, video and sensory imaging, and the convergence of biomedical images with other non-imaging datasets.