Xiangzuo Huo, Shengwei Tian, Long Yu, Wendong Zhang, Aolun Li, Qimeng Yang, Jinmiao Song
{"title":"MM-HiFuse: multi-modal multi-task hierarchical feature fusion for esophagus cancer staging and differentiation classification","authors":"Xiangzuo Huo, Shengwei Tian, Long Yu, Wendong Zhang, Aolun Li, Qimeng Yang, Jinmiao Song","doi":"10.1007/s40747-024-01708-5","DOIUrl":null,"url":null,"abstract":"<p>Esophageal cancer is a globally significant but understudied type of cancer with high mortality rates. The staging and differentiation of esophageal cancer are crucial factors in determining the prognosis and surgical treatment plan for patients, as well as improving their chances of survival. Endoscopy and histopathological examination are considered as the gold standard for esophageal cancer diagnosis. However, some previous studies have employed deep learning-based methods for esophageal cancer analysis, which are limited to single-modal features, resulting in inadequate classification results. In response to these limitations, multi-modal learning has emerged as a promising alternative for medical image analysis tasks. In this paper, we propose a hierarchical feature fusion network, MM-HiFuse, for multi-modal multitask learning to improve the classification accuracy of esophageal cancer staging and differentiation level. The proposed architecture combines low-level to deep-level features of both pathological and endoscopic images to achieve accurate classification results. The key characteristics of MM-HiFuse include: (i) a parallel hierarchy of convolution and self-attention layers specifically designed for pathological and endoscopic image features; (ii) a multi-modal hierarchical feature fusion module (MHF) and a new multitask weighted combination loss function. The benefits of these features are the effective extraction of multi-modal representations at different semantic scales and the mutual complementarity of the multitask learning, leading to improved classification performance. Experimental results demonstrate that MM-HiFuse outperforms single-modal methods in esophageal cancer staging and differentiation classification. Our findings provide evidence for the early diagnosis and accurate staging of esophageal cancer and serve as a new inspiration for the application of multi-modal multitask learning in medical image analysis. Code is available at https://github.com/huoxiangzuo/MM-HiFuse.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"67 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-024-01708-5","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Esophageal cancer is a globally significant but understudied type of cancer with high mortality rates. The staging and differentiation of esophageal cancer are crucial factors in determining the prognosis and surgical treatment plan for patients, as well as improving their chances of survival. Endoscopy and histopathological examination are considered as the gold standard for esophageal cancer diagnosis. However, some previous studies have employed deep learning-based methods for esophageal cancer analysis, which are limited to single-modal features, resulting in inadequate classification results. In response to these limitations, multi-modal learning has emerged as a promising alternative for medical image analysis tasks. In this paper, we propose a hierarchical feature fusion network, MM-HiFuse, for multi-modal multitask learning to improve the classification accuracy of esophageal cancer staging and differentiation level. The proposed architecture combines low-level to deep-level features of both pathological and endoscopic images to achieve accurate classification results. The key characteristics of MM-HiFuse include: (i) a parallel hierarchy of convolution and self-attention layers specifically designed for pathological and endoscopic image features; (ii) a multi-modal hierarchical feature fusion module (MHF) and a new multitask weighted combination loss function. The benefits of these features are the effective extraction of multi-modal representations at different semantic scales and the mutual complementarity of the multitask learning, leading to improved classification performance. Experimental results demonstrate that MM-HiFuse outperforms single-modal methods in esophageal cancer staging and differentiation classification. Our findings provide evidence for the early diagnosis and accurate staging of esophageal cancer and serve as a new inspiration for the application of multi-modal multitask learning in medical image analysis. Code is available at https://github.com/huoxiangzuo/MM-HiFuse.
期刊介绍:
Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.