Quan Gan, Chen Wang, Zhaoman Zhong, Jiaying Wu, Qiwei Ge, Lei Shi, Jiaqing Shang, Chuanxia Liu
{"title":"融合舌图像和描述的跨模态注意模型:一种新型的智能中医病理器官诊断方法。","authors":"Quan Gan, Chen Wang, Zhaoman Zhong, Jiaying Wu, Qiwei Ge, Lei Shi, Jiaqing Shang, Chuanxia Liu","doi":"10.3389/fphys.2025.1580985","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Tongue diagnosis is a fundamental technique in traditional Chinese medicine (TCM), where clinicians evaluate the tongue's appearance to infer the condition of pathological organs. However, most existing research on intelligent tongue diagnosis primarily focuses on analyzing tongue images, often neglecting the important descriptive text that accompanies these images. This text is an essential component of clinical diagnosis. To overcome this gap, we propose a novel Cross-Modal Pathological Organ Diagnosis Model that integrates tongue images and textual descriptions for more accurate pathological classification.</p><p><strong>Methods: </strong>Our model extracts features from both the tongue images and the corresponding textual descriptions. These features are then fused using a cross-modal attention mechanism to enhance the classification of pathological organs. The cross-modal attention mechanism enables the model to effectively combine visual and textual information, addressing the limitations of using either modality alone.</p><p><strong>Results: </strong>We conducted experiments using a self-constructed dataset to evaluate our model's performance. The results demonstrate that our model outperforms common models regarding overall accuracy. Additionally, ablation studies, where either tongue images or textual descriptions were used alone, confirmed the significant benefit of multimodal fusion in improving diagnostic accuracy.</p><p><strong>Discussion: </strong>This study introduces a new perspective on intelligent tongue diagnosis in TCM by incorporating visual and textual data. The experimental findings highlight the importance of cross-modal feature fusion for improving the accuracy of pathological diagnosis. Our approach not only contributes to the development of more effective diagnostic systems but also paves the way for future advancements in the automation of TCM diagnosis.</p>","PeriodicalId":12477,"journal":{"name":"Frontiers in Physiology","volume":"16 ","pages":"1580985"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12059375/pdf/","citationCount":"0","resultStr":"{\"title\":\"Cross-modal attention model integrating tongue images and descriptions: a novel intelligent TCM approach for pathological organ diagnosis.\",\"authors\":\"Quan Gan, Chen Wang, Zhaoman Zhong, Jiaying Wu, Qiwei Ge, Lei Shi, Jiaqing Shang, Chuanxia Liu\",\"doi\":\"10.3389/fphys.2025.1580985\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>Tongue diagnosis is a fundamental technique in traditional Chinese medicine (TCM), where clinicians evaluate the tongue's appearance to infer the condition of pathological organs. However, most existing research on intelligent tongue diagnosis primarily focuses on analyzing tongue images, often neglecting the important descriptive text that accompanies these images. This text is an essential component of clinical diagnosis. To overcome this gap, we propose a novel Cross-Modal Pathological Organ Diagnosis Model that integrates tongue images and textual descriptions for more accurate pathological classification.</p><p><strong>Methods: </strong>Our model extracts features from both the tongue images and the corresponding textual descriptions. These features are then fused using a cross-modal attention mechanism to enhance the classification of pathological organs. The cross-modal attention mechanism enables the model to effectively combine visual and textual information, addressing the limitations of using either modality alone.</p><p><strong>Results: </strong>We conducted experiments using a self-constructed dataset to evaluate our model's performance. The results demonstrate that our model outperforms common models regarding overall accuracy. Additionally, ablation studies, where either tongue images or textual descriptions were used alone, confirmed the significant benefit of multimodal fusion in improving diagnostic accuracy.</p><p><strong>Discussion: </strong>This study introduces a new perspective on intelligent tongue diagnosis in TCM by incorporating visual and textual data. The experimental findings highlight the importance of cross-modal feature fusion for improving the accuracy of pathological diagnosis. Our approach not only contributes to the development of more effective diagnostic systems but also paves the way for future advancements in the automation of TCM diagnosis.</p>\",\"PeriodicalId\":12477,\"journal\":{\"name\":\"Frontiers in Physiology\",\"volume\":\"16 \",\"pages\":\"1580985\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12059375/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Physiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3389/fphys.2025.1580985\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Physiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fphys.2025.1580985","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"PHYSIOLOGY","Score":null,"Total":0}
Cross-modal attention model integrating tongue images and descriptions: a novel intelligent TCM approach for pathological organ diagnosis.
Introduction: Tongue diagnosis is a fundamental technique in traditional Chinese medicine (TCM), where clinicians evaluate the tongue's appearance to infer the condition of pathological organs. However, most existing research on intelligent tongue diagnosis primarily focuses on analyzing tongue images, often neglecting the important descriptive text that accompanies these images. This text is an essential component of clinical diagnosis. To overcome this gap, we propose a novel Cross-Modal Pathological Organ Diagnosis Model that integrates tongue images and textual descriptions for more accurate pathological classification.
Methods: Our model extracts features from both the tongue images and the corresponding textual descriptions. These features are then fused using a cross-modal attention mechanism to enhance the classification of pathological organs. The cross-modal attention mechanism enables the model to effectively combine visual and textual information, addressing the limitations of using either modality alone.
Results: We conducted experiments using a self-constructed dataset to evaluate our model's performance. The results demonstrate that our model outperforms common models regarding overall accuracy. Additionally, ablation studies, where either tongue images or textual descriptions were used alone, confirmed the significant benefit of multimodal fusion in improving diagnostic accuracy.
Discussion: This study introduces a new perspective on intelligent tongue diagnosis in TCM by incorporating visual and textual data. The experimental findings highlight the importance of cross-modal feature fusion for improving the accuracy of pathological diagnosis. Our approach not only contributes to the development of more effective diagnostic systems but also paves the way for future advancements in the automation of TCM diagnosis.
期刊介绍:
Frontiers in Physiology is a leading journal in its field, publishing rigorously peer-reviewed research on the physiology of living systems, from the subcellular and molecular domains to the intact organism, and its interaction with the environment. Field Chief Editor George E. Billman at the Ohio State University Columbus is supported by an outstanding Editorial Board of international researchers. This multidisciplinary open-access journal is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics, clinicians and the public worldwide.