基于深度学习的多模态分类在鼻咽癌中的高效T分期

IF 3.3 3区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Dili Song , Xu Han , Yong Li , Hong Ye , Yongguang Cai , Liqun Chen , Lujian Xu , Ying Zou , Haibo Zhang , Diping Song
{"title":"基于深度学习的多模态分类在鼻咽癌中的高效T分期","authors":"Dili Song ,&nbsp;Xu Han ,&nbsp;Yong Li ,&nbsp;Hong Ye ,&nbsp;Yongguang Cai ,&nbsp;Liqun Chen ,&nbsp;Lujian Xu ,&nbsp;Ying Zou ,&nbsp;Haibo Zhang ,&nbsp;Diping Song","doi":"10.1016/j.ejrad.2025.112407","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Accurate T staging of nasopharyngeal carcinoma (NPC) is crucial for precision therapeutic strategies. The T staging process is associated with significant challenges, including time consumption and variability among observers. This study aimed to develop an efficient, automated T staging system that supports personalized treatment and optimizes clinical workflows.</div></div><div><h3>Methods</h3><div>A total of 609 NPC patients were included, with 487 in the training cohort and 122 in the validation cohort. We employed a multi-modal learning framework that integrates MRI images and reports. Automatically delineated regions of interest (ROIs) served as masks. A hierarchical classification strategy (DeepTree) addressed complex staging challenges. We utilized Vision Transformer (ViT) to extract visual features and BERT to encode text features. To improve data fusion, we applied a Q-Former to integrate visual and textual information. The performances of the methods were evaluated using accuracy (ACC), the area under the receiver operating characteristic curve (AUC), precision, sensitivity (SEN), and specificity (SPE).</div></div><div><h3>Results</h3><div>Integrating images and text via Q-Former demonstrated superior performance overall, significantly surpassing single-modality methods. IT-DTM-BLIP2 demonstrated strong performance, achieving an accuracy (ACC) of 0.787 (95% CI 0.714–0.860). The area under the receiver operating characteristic curve (AUC) values were AUC1 (T2 vs. T3/T4) at 0.815 (0.71–0.900) and AUC2 (T3 vs. T4) at 0.876 (0.782–0.945).</div></div><div><h3>Conclusion</h3><div>Our multi-modal approach consistently performs well, offering a robust automated solution that eliminates the need for manual tumor delineation. This streamlines workflows, reduces subjectivity, and offers decision-making support that may improve workflow efficiency and encourage consistency.</div></div>","PeriodicalId":12063,"journal":{"name":"European Journal of Radiology","volume":"192 ","pages":"Article 112407"},"PeriodicalIF":3.3000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient T staging in nasopharyngeal carcinoma via deep Learning-Based Multi-Modal classification\",\"authors\":\"Dili Song ,&nbsp;Xu Han ,&nbsp;Yong Li ,&nbsp;Hong Ye ,&nbsp;Yongguang Cai ,&nbsp;Liqun Chen ,&nbsp;Lujian Xu ,&nbsp;Ying Zou ,&nbsp;Haibo Zhang ,&nbsp;Diping Song\",\"doi\":\"10.1016/j.ejrad.2025.112407\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Accurate T staging of nasopharyngeal carcinoma (NPC) is crucial for precision therapeutic strategies. The T staging process is associated with significant challenges, including time consumption and variability among observers. This study aimed to develop an efficient, automated T staging system that supports personalized treatment and optimizes clinical workflows.</div></div><div><h3>Methods</h3><div>A total of 609 NPC patients were included, with 487 in the training cohort and 122 in the validation cohort. We employed a multi-modal learning framework that integrates MRI images and reports. Automatically delineated regions of interest (ROIs) served as masks. A hierarchical classification strategy (DeepTree) addressed complex staging challenges. We utilized Vision Transformer (ViT) to extract visual features and BERT to encode text features. To improve data fusion, we applied a Q-Former to integrate visual and textual information. The performances of the methods were evaluated using accuracy (ACC), the area under the receiver operating characteristic curve (AUC), precision, sensitivity (SEN), and specificity (SPE).</div></div><div><h3>Results</h3><div>Integrating images and text via Q-Former demonstrated superior performance overall, significantly surpassing single-modality methods. IT-DTM-BLIP2 demonstrated strong performance, achieving an accuracy (ACC) of 0.787 (95% CI 0.714–0.860). The area under the receiver operating characteristic curve (AUC) values were AUC1 (T2 vs. T3/T4) at 0.815 (0.71–0.900) and AUC2 (T3 vs. T4) at 0.876 (0.782–0.945).</div></div><div><h3>Conclusion</h3><div>Our multi-modal approach consistently performs well, offering a robust automated solution that eliminates the need for manual tumor delineation. This streamlines workflows, reduces subjectivity, and offers decision-making support that may improve workflow efficiency and encourage consistency.</div></div>\",\"PeriodicalId\":12063,\"journal\":{\"name\":\"European Journal of Radiology\",\"volume\":\"192 \",\"pages\":\"Article 112407\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Journal of Radiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0720048X25004930\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Radiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0720048X25004930","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

摘要

背景准确的鼻咽癌T分期对于制定精确的治疗策略至关重要。T分期过程具有重大挑战,包括时间消耗和观察者之间的可变性。本研究旨在开发一种高效、自动化的T分期系统,以支持个性化治疗并优化临床工作流程。方法共纳入609例鼻咽癌患者,其中训练组487例,验证组122例。我们采用了一个多模式学习框架,将MRI图像和报告整合在一起。自动划定的感兴趣区域(roi)作为掩模。分层分类策略(DeepTree)解决了复杂的分级挑战。利用视觉变换(Vision Transformer, ViT)提取视觉特征,利用BERT对文本特征进行编码。为了改善数据融合,我们采用了Q-Former来整合视觉和文本信息。采用准确度(ACC)、受试者工作特征曲线下面积(AUC)、精密度、灵敏度(SEN)和特异度(SPE)评价方法的性能。结果通过Q-Former对图像和文本进行整合,整体表现优于单模态方法。IT-DTM-BLIP2表现出强大的性能,达到0.787的准确性(ACC) (95% CI 0.714-0.860)。受试者工作特征曲线下面积(AUC)值AUC1 (T2 vs. T3/T4)为0.815 (0.71-0.900),AUC2 (T3 vs. T4)为0.876(0.782-0.945)。结论:我们的多模态方法一直表现良好,提供了一个强大的自动化解决方案,消除了人工肿瘤划定的需要。这简化了工作流程,减少了主观性,并提供了可能提高工作流程效率和鼓励一致性的决策支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Efficient T staging in nasopharyngeal carcinoma via deep Learning-Based Multi-Modal classification

Background

Accurate T staging of nasopharyngeal carcinoma (NPC) is crucial for precision therapeutic strategies. The T staging process is associated with significant challenges, including time consumption and variability among observers. This study aimed to develop an efficient, automated T staging system that supports personalized treatment and optimizes clinical workflows.

Methods

A total of 609 NPC patients were included, with 487 in the training cohort and 122 in the validation cohort. We employed a multi-modal learning framework that integrates MRI images and reports. Automatically delineated regions of interest (ROIs) served as masks. A hierarchical classification strategy (DeepTree) addressed complex staging challenges. We utilized Vision Transformer (ViT) to extract visual features and BERT to encode text features. To improve data fusion, we applied a Q-Former to integrate visual and textual information. The performances of the methods were evaluated using accuracy (ACC), the area under the receiver operating characteristic curve (AUC), precision, sensitivity (SEN), and specificity (SPE).

Results

Integrating images and text via Q-Former demonstrated superior performance overall, significantly surpassing single-modality methods. IT-DTM-BLIP2 demonstrated strong performance, achieving an accuracy (ACC) of 0.787 (95% CI 0.714–0.860). The area under the receiver operating characteristic curve (AUC) values were AUC1 (T2 vs. T3/T4) at 0.815 (0.71–0.900) and AUC2 (T3 vs. T4) at 0.876 (0.782–0.945).

Conclusion

Our multi-modal approach consistently performs well, offering a robust automated solution that eliminates the need for manual tumor delineation. This streamlines workflows, reduces subjectivity, and offers decision-making support that may improve workflow efficiency and encourage consistency.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.70
自引率
3.00%
发文量
398
审稿时长
42 days
期刊介绍: European Journal of Radiology is an international journal which aims to communicate to its readers, state-of-the-art information on imaging developments in the form of high quality original research articles and timely reviews on current developments in the field. Its audience includes clinicians at all levels of training including radiology trainees, newly qualified imaging specialists and the experienced radiologist. Its aim is to inform efficient, appropriate and evidence-based imaging practice to the benefit of patients worldwide.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信