TongueNet：中医舌诊多模态融合多标签分类模型。

IF 3.2 3区医学 Q2 PHYSIOLOGY

Frontiers in Physiology Pub Date : 2025-04-25 eCollection Date: 2025-01-01 DOI:10.3389/fphys.2025.1527751

Lijuan Yang, Qiumei Dong, Da Lin, Xinliang Lü

{"title":"TongueNet：中医舌诊多模态融合多标签分类模型。","authors":"Lijuan Yang, Qiumei Dong, Da Lin, Xinliang Lü","doi":"10.3389/fphys.2025.1527751","DOIUrl":null,"url":null,"abstract":"Tongue diagnosis in Traditional Chinese Medicine (TCM) plays a crucial role in clinical practice. By observing the shape, color, and coating of the tongue, practitioners can assist in determining the nature and location of a disease. However, the field of tongue diagnosis currently faces challenges such as data scarcity and a lack of efficient multimodal diagnostic models, making it difficult to fully align with TCM theories and clinical needs. Additionally, existing methods generally lack multi-label classification capabilities, making it challenging to simultaneously meet the multidimensional requirements of TCM diagnosis for disease nature and location. To address these issues, this paper proposes TongueNet, a multimodal deep learning model that integrates tongue image data with text-based features. The model utilizes a Hierarchical Aggregation Network (HAN) and a Feature Space Projection Module to efficiently extract and fuse features while introducing consistency and complementarity constraints to optimize multimodal information fusion. Furthermore, the model incorporates a multi-scale attention mechanism (EMA) to enhance the diversity and accuracy of feature weighting and employs a Kolmogorov-Arnold Network (KAN) instead of traditional MLPs for output optimization, thereby improving the representation of complex features. For model training, this study integrates three publicly available tongue image datasets from the Roboflow platform and enlists multiple experts for multimodal annotation, incorporating multi-label information on disease nature and location to align with TCM clinical needs. Experimental results demonstrate that TongueNet outperforms existing models in both disease nature and disease location classification tasks. Specifically, in the disease nature classification task, it achieves 89.12% accuracy and an AUC of 83%; in the disease location classification task, it achieves 86.47% accuracy and an AUC of 81%. Moreover, TongueNet contains only 32.1 M parameters, significantly reducing computational resource requirements while maintaining high diagnostic performance. TongueNet provides a new approach for the intelligent development of TCM tongue diagnosis.","PeriodicalId":12477,"journal":{"name":"Frontiers in Physiology","volume":"16 ","pages":"1527751"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12061702/pdf/","citationCount":"0","resultStr":"{\"title\":\"TongueNet: a multi-modal fusion and multi-label classification model for traditional Chinese Medicine tongue diagnosis.\",\"authors\":\"Lijuan Yang, Qiumei Dong, Da Lin, Xinliang Lü\",\"doi\":\"10.3389/fphys.2025.1527751\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Tongue diagnosis in Traditional Chinese Medicine (TCM) plays a crucial role in clinical practice. By observing the shape, color, and coating of the tongue, practitioners can assist in determining the nature and location of a disease. However, the field of tongue diagnosis currently faces challenges such as data scarcity and a lack of efficient multimodal diagnostic models, making it difficult to fully align with TCM theories and clinical needs. Additionally, existing methods generally lack multi-label classification capabilities, making it challenging to simultaneously meet the multidimensional requirements of TCM diagnosis for disease nature and location. To address these issues, this paper proposes TongueNet, a multimodal deep learning model that integrates tongue image data with text-based features. The model utilizes a Hierarchical Aggregation Network (HAN) and a Feature Space Projection Module to efficiently extract and fuse features while introducing consistency and complementarity constraints to optimize multimodal information fusion. Furthermore, the model incorporates a multi-scale attention mechanism (EMA) to enhance the diversity and accuracy of feature weighting and employs a Kolmogorov-Arnold Network (KAN) instead of traditional MLPs for output optimization, thereby improving the representation of complex features. For model training, this study integrates three publicly available tongue image datasets from the Roboflow platform and enlists multiple experts for multimodal annotation, incorporating multi-label information on disease nature and location to align with TCM clinical needs. Experimental results demonstrate that TongueNet outperforms existing models in both disease nature and disease location classification tasks. Specifically, in the disease nature classification task, it achieves 89.12% accuracy and an AUC of 83%; in the disease location classification task, it achieves 86.47% accuracy and an AUC of 81%. Moreover, TongueNet contains only 32.1 M parameters, significantly reducing computational resource requirements while maintaining high diagnostic performance. TongueNet provides a new approach for the intelligent development of TCM tongue diagnosis.\",\"PeriodicalId\":12477,\"journal\":{\"name\":\"Frontiers in Physiology\",\"volume\":\"16 \",\"pages\":\"1527751\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12061702/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Physiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3389/fphys.2025.1527751\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Physiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fphys.2025.1527751","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"PHYSIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

中医舌诊在临床中起着至关重要的作用。通过观察舌头的形状、颜色和舌苔，医生可以帮助确定疾病的性质和部位。然而，舌诊领域目前面临着数据匮乏、缺乏高效的多模态诊断模型等挑战，难以与中医理论和临床需求充分接轨。此外，现有方法普遍缺乏多标签分类能力，难以同时满足中医诊断对疾病性质和部位的多维度要求。为了解决这些问题，本文提出了TongueNet，这是一个将舌头图像数据与基于文本的特征集成在一起的多模态深度学习模型。该模型利用层次聚合网络（HAN）和特征空间投影模块高效提取和融合特征，同时引入一致性和互补性约束优化多模态信息融合。此外，该模型还引入了多尺度注意机制（EMA）来增强特征权重的多样性和准确性，并采用Kolmogorov-Arnold Network （KAN）代替传统的mlp进行输出优化，从而提高了复杂特征的表征能力。在模型训练方面，本研究整合了来自Roboflow平台的三个公开的舌头图像数据集，并招募了多位专家进行多模态注释，结合疾病性质和位置的多标签信息，以符合中医临床需求。实验结果表明，TongueNet在疾病性质和疾病位置分类任务上都优于现有模型。具体而言，在疾病性质分类任务中，准确率达到89.12%，AUC为83%；在疾病位置分类任务中，准确率达到86.47%，AUC为81%。此外，TongueNet仅包含32.1 M个参数，在保持高诊断性能的同时显著降低了计算资源需求。舌网为中医舌诊智能化发展提供了新的途径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

TongueNet: a multi-modal fusion and multi-label classification model for traditional Chinese Medicine tongue diagnosis.

Tongue diagnosis in Traditional Chinese Medicine (TCM) plays a crucial role in clinical practice. By observing the shape, color, and coating of the tongue, practitioners can assist in determining the nature and location of a disease. However, the field of tongue diagnosis currently faces challenges such as data scarcity and a lack of efficient multimodal diagnostic models, making it difficult to fully align with TCM theories and clinical needs. Additionally, existing methods generally lack multi-label classification capabilities, making it challenging to simultaneously meet the multidimensional requirements of TCM diagnosis for disease nature and location. To address these issues, this paper proposes TongueNet, a multimodal deep learning model that integrates tongue image data with text-based features. The model utilizes a Hierarchical Aggregation Network (HAN) and a Feature Space Projection Module to efficiently extract and fuse features while introducing consistency and complementarity constraints to optimize multimodal information fusion. Furthermore, the model incorporates a multi-scale attention mechanism (EMA) to enhance the diversity and accuracy of feature weighting and employs a Kolmogorov-Arnold Network (KAN) instead of traditional MLPs for output optimization, thereby improving the representation of complex features. For model training, this study integrates three publicly available tongue image datasets from the Roboflow platform and enlists multiple experts for multimodal annotation, incorporating multi-label information on disease nature and location to align with TCM clinical needs. Experimental results demonstrate that TongueNet outperforms existing models in both disease nature and disease location classification tasks. Specifically, in the disease nature classification task, it achieves 89.12% accuracy and an AUC of 83%; in the disease location classification task, it achieves 86.47% accuracy and an AUC of 81%. Moreover, TongueNet contains only 32.1 M parameters, significantly reducing computational resource requirements while maintaining high diagnostic performance. TongueNet provides a new approach for the intelligent development of TCM tongue diagnosis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in Physiology PHYSIOLOGY-

CiteScore

6.50

自引率

5.00%

发文量

2608

审稿时长

14 weeks

期刊介绍： Frontiers in Physiology is a leading journal in its field, publishing rigorously peer-reviewed research on the physiology of living systems, from the subcellular and molecular domains to the intact organism, and its interaction with the environment. Field Chief Editor George E. Billman at the Ohio State University Columbus is supported by an outstanding Editorial Board of international researchers. This multidisciplinary open-access journal is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics, clinicians and the public worldwide.