融合舌图像和描述的跨模态注意模型:一种新型的智能中医病理器官诊断方法。

IF 3.2 3区 医学 Q2 PHYSIOLOGY
Frontiers in Physiology Pub Date : 2025-04-23 eCollection Date: 2025-01-01 DOI:10.3389/fphys.2025.1580985
Quan Gan, Chen Wang, Zhaoman Zhong, Jiaying Wu, Qiwei Ge, Lei Shi, Jiaqing Shang, Chuanxia Liu
{"title":"融合舌图像和描述的跨模态注意模型:一种新型的智能中医病理器官诊断方法。","authors":"Quan Gan, Chen Wang, Zhaoman Zhong, Jiaying Wu, Qiwei Ge, Lei Shi, Jiaqing Shang, Chuanxia Liu","doi":"10.3389/fphys.2025.1580985","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Tongue diagnosis is a fundamental technique in traditional Chinese medicine (TCM), where clinicians evaluate the tongue's appearance to infer the condition of pathological organs. However, most existing research on intelligent tongue diagnosis primarily focuses on analyzing tongue images, often neglecting the important descriptive text that accompanies these images. This text is an essential component of clinical diagnosis. To overcome this gap, we propose a novel Cross-Modal Pathological Organ Diagnosis Model that integrates tongue images and textual descriptions for more accurate pathological classification.</p><p><strong>Methods: </strong>Our model extracts features from both the tongue images and the corresponding textual descriptions. These features are then fused using a cross-modal attention mechanism to enhance the classification of pathological organs. The cross-modal attention mechanism enables the model to effectively combine visual and textual information, addressing the limitations of using either modality alone.</p><p><strong>Results: </strong>We conducted experiments using a self-constructed dataset to evaluate our model's performance. The results demonstrate that our model outperforms common models regarding overall accuracy. Additionally, ablation studies, where either tongue images or textual descriptions were used alone, confirmed the significant benefit of multimodal fusion in improving diagnostic accuracy.</p><p><strong>Discussion: </strong>This study introduces a new perspective on intelligent tongue diagnosis in TCM by incorporating visual and textual data. The experimental findings highlight the importance of cross-modal feature fusion for improving the accuracy of pathological diagnosis. Our approach not only contributes to the development of more effective diagnostic systems but also paves the way for future advancements in the automation of TCM diagnosis.</p>","PeriodicalId":12477,"journal":{"name":"Frontiers in Physiology","volume":"16 ","pages":"1580985"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12059375/pdf/","citationCount":"0","resultStr":"{\"title\":\"Cross-modal attention model integrating tongue images and descriptions: a novel intelligent TCM approach for pathological organ diagnosis.\",\"authors\":\"Quan Gan, Chen Wang, Zhaoman Zhong, Jiaying Wu, Qiwei Ge, Lei Shi, Jiaqing Shang, Chuanxia Liu\",\"doi\":\"10.3389/fphys.2025.1580985\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>Tongue diagnosis is a fundamental technique in traditional Chinese medicine (TCM), where clinicians evaluate the tongue's appearance to infer the condition of pathological organs. However, most existing research on intelligent tongue diagnosis primarily focuses on analyzing tongue images, often neglecting the important descriptive text that accompanies these images. This text is an essential component of clinical diagnosis. To overcome this gap, we propose a novel Cross-Modal Pathological Organ Diagnosis Model that integrates tongue images and textual descriptions for more accurate pathological classification.</p><p><strong>Methods: </strong>Our model extracts features from both the tongue images and the corresponding textual descriptions. These features are then fused using a cross-modal attention mechanism to enhance the classification of pathological organs. The cross-modal attention mechanism enables the model to effectively combine visual and textual information, addressing the limitations of using either modality alone.</p><p><strong>Results: </strong>We conducted experiments using a self-constructed dataset to evaluate our model's performance. The results demonstrate that our model outperforms common models regarding overall accuracy. Additionally, ablation studies, where either tongue images or textual descriptions were used alone, confirmed the significant benefit of multimodal fusion in improving diagnostic accuracy.</p><p><strong>Discussion: </strong>This study introduces a new perspective on intelligent tongue diagnosis in TCM by incorporating visual and textual data. The experimental findings highlight the importance of cross-modal feature fusion for improving the accuracy of pathological diagnosis. Our approach not only contributes to the development of more effective diagnostic systems but also paves the way for future advancements in the automation of TCM diagnosis.</p>\",\"PeriodicalId\":12477,\"journal\":{\"name\":\"Frontiers in Physiology\",\"volume\":\"16 \",\"pages\":\"1580985\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12059375/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Physiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3389/fphys.2025.1580985\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Physiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fphys.2025.1580985","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"PHYSIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

舌诊是中医的一项基本技术,临床医生通过评估舌的外观来推断病理器官的状况。然而,大多数现有的智能舌头诊断研究主要集中在分析舌头图像上,往往忽略了伴随这些图像的重要描述性文本。这篇文章是临床诊断的重要组成部分。为了克服这一差距,我们提出了一种新的跨模态病理器官诊断模型,该模型集成了舌头图像和文本描述,以实现更准确的病理分类。方法:我们的模型从舌头图像和相应的文本描述中提取特征。然后使用跨模态注意机制融合这些特征,以增强病理器官的分类。跨模态注意机制使模型能够有效地结合视觉和文本信息,解决了单独使用任何一种模态的局限性。结果:我们使用自构建的数据集进行了实验,以评估我们的模型的性能。结果表明,我们的模型在整体精度方面优于普通模型。此外,消融研究,无论是舌头图像或文字描述单独使用,证实了多模态融合在提高诊断准确性方面的显着益处。讨论:本研究介绍了结合视觉和文本数据的中医舌智能诊断的新视角。实验结果强调了跨模态特征融合对提高病理诊断准确性的重要性。我们的方法不仅有助于开发更有效的诊断系统,而且为中医诊断自动化的未来发展铺平了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Cross-modal attention model integrating tongue images and descriptions: a novel intelligent TCM approach for pathological organ diagnosis.

Introduction: Tongue diagnosis is a fundamental technique in traditional Chinese medicine (TCM), where clinicians evaluate the tongue's appearance to infer the condition of pathological organs. However, most existing research on intelligent tongue diagnosis primarily focuses on analyzing tongue images, often neglecting the important descriptive text that accompanies these images. This text is an essential component of clinical diagnosis. To overcome this gap, we propose a novel Cross-Modal Pathological Organ Diagnosis Model that integrates tongue images and textual descriptions for more accurate pathological classification.

Methods: Our model extracts features from both the tongue images and the corresponding textual descriptions. These features are then fused using a cross-modal attention mechanism to enhance the classification of pathological organs. The cross-modal attention mechanism enables the model to effectively combine visual and textual information, addressing the limitations of using either modality alone.

Results: We conducted experiments using a self-constructed dataset to evaluate our model's performance. The results demonstrate that our model outperforms common models regarding overall accuracy. Additionally, ablation studies, where either tongue images or textual descriptions were used alone, confirmed the significant benefit of multimodal fusion in improving diagnostic accuracy.

Discussion: This study introduces a new perspective on intelligent tongue diagnosis in TCM by incorporating visual and textual data. The experimental findings highlight the importance of cross-modal feature fusion for improving the accuracy of pathological diagnosis. Our approach not only contributes to the development of more effective diagnostic systems but also paves the way for future advancements in the automation of TCM diagnosis.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.50
自引率
5.00%
发文量
2608
审稿时长
14 weeks
期刊介绍: Frontiers in Physiology is a leading journal in its field, publishing rigorously peer-reviewed research on the physiology of living systems, from the subcellular and molecular domains to the intact organism, and its interaction with the environment. Field Chief Editor George E. Billman at the Ohio State University Columbus is supported by an outstanding Editorial Board of international researchers. This multidisciplinary open-access journal is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics, clinicians and the public worldwide.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信