Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study.

IF 3.8 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics Pub Date : 2025-06-20 DOI:10.2196/75103

Chunliang Chen, Xinyu Wang, Ming Guan, Wenjing Yue, Yuanbin Wu, Ya Zhou, Xiaoling Wang

{"title":"Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study.","authors":"Chunliang Chen, Xinyu Wang, Ming Guan, Wenjing Yue, Yuanbin Wu, Ya Zhou, Xiaoling Wang","doi":"10.2196/75103","DOIUrl":null,"url":null,"abstract":"Background: A large language model (LLM) provides new opportunities to advance the intelligent development of traditional Chinese medicine (TCM). Syndrome differentiation thinking is an essential part of TCM and equipping LLMs with this capability represents a crucial step toward more effective clinical applications of TCM. However, given the complexity of TCM syndrome differentiation thinking, acquiring this ability is a considerable challenge for the model.Objective: This study aims to evaluate the ability of LLMs for syndrome differentiation thinking and design a method to effectively enhance their performance in this area.Methods: We decomposed the process of syndrome differentiation thinking in TCM into three core tasks: pathogenesis inference, syndrome inference, and diagnostic suggestion. To evaluate the performance of LLMs in these tasks, we constructed a high-quality evaluation dataset, forming a reliable foundation for quantitative assessment of their capabilities. Furthermore, we developed a methodology for generating instruction data based on the idea of an \"open-book exam,\" customized three data templates, and dynamically retrieved task-relevant professional knowledge that was inserted into predefined positions within the templates. This approach effectively generates high-quality instruction data that aligns with the unique characteristics of TCM syndrome differentiation thinking. Leveraging this instruction data, we fine-tuned the base model, enhancing the syndrome differentiation thinking ability of the LLMs.Results: We collected 200 medical cases for the evaluation dataset and standardized them into three types of task questions. We tested general and TCM-specific LLMs, comparing their performance with our proposed solution. The findings demonstrated that our method significantly enhanced LLMs' syndrome differentiation thinking. Our model achieved 85.7% in Task 1 and 81.2% accuracy in Task 2, surpassing the best-performing TCM and general LLMs by 26.3% and 15.8%, respectively. In Task 3, our model achieved a similarity score of 84.3, indicating that the model was remarkably similar to advice given by experts.Conclusions: Existing general LLMs and TCM-specific LLMs continue to have significant limitations in the core task of syndrome differentiation thinking. Our research shows that fine-tuning LLMs by designing professional instruction templates and generating high-quality instruction data can significantly improve their performance on core tasks. The optimized LLMs show a high degree of similarity in reasoning results, consistent with the opinions of domain experts, indicating that they can simulate syndrome differentiation thinking to a certain extent. These findings have important theoretical and practical significance for in-depth interpretation of the complexity of the clinical diagnosis and treatment process of TCM.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e75103"},"PeriodicalIF":3.8000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204376/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/75103","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: A large language model (LLM) provides new opportunities to advance the intelligent development of traditional Chinese medicine (TCM). Syndrome differentiation thinking is an essential part of TCM and equipping LLMs with this capability represents a crucial step toward more effective clinical applications of TCM. However, given the complexity of TCM syndrome differentiation thinking, acquiring this ability is a considerable challenge for the model.

Objective: This study aims to evaluate the ability of LLMs for syndrome differentiation thinking and design a method to effectively enhance their performance in this area.

Methods: We decomposed the process of syndrome differentiation thinking in TCM into three core tasks: pathogenesis inference, syndrome inference, and diagnostic suggestion. To evaluate the performance of LLMs in these tasks, we constructed a high-quality evaluation dataset, forming a reliable foundation for quantitative assessment of their capabilities. Furthermore, we developed a methodology for generating instruction data based on the idea of an "open-book exam," customized three data templates, and dynamically retrieved task-relevant professional knowledge that was inserted into predefined positions within the templates. This approach effectively generates high-quality instruction data that aligns with the unique characteristics of TCM syndrome differentiation thinking. Leveraging this instruction data, we fine-tuned the base model, enhancing the syndrome differentiation thinking ability of the LLMs.

Results: We collected 200 medical cases for the evaluation dataset and standardized them into three types of task questions. We tested general and TCM-specific LLMs, comparing their performance with our proposed solution. The findings demonstrated that our method significantly enhanced LLMs' syndrome differentiation thinking. Our model achieved 85.7% in Task 1 and 81.2% accuracy in Task 2, surpassing the best-performing TCM and general LLMs by 26.3% and 15.8%, respectively. In Task 3, our model achieved a similarity score of 84.3, indicating that the model was remarkably similar to advice given by experts.

Conclusions: Existing general LLMs and TCM-specific LLMs continue to have significant limitations in the core task of syndrome differentiation thinking. Our research shows that fine-tuning LLMs by designing professional instruction templates and generating high-quality instruction data can significantly improve their performance on core tasks. The optimized LLMs show a high degree of similarity in reasoning results, consistent with the opinions of domain experts, indicating that they can simulate syndrome differentiation thinking to a certain extent. These findings have important theoretical and practical significance for in-depth interpretation of the complexity of the clinical diagnosis and treatment process of TCM.

查看原文本刊更多论文

评价与提高大语言模型辨证思维能力：方法发展研究。

背景：大语言模型（LLM）为推进中医药的智能化发展提供了新的机遇。辨证思维是中医的重要组成部分，使法学硕士具备辨证思维能力是提高中医临床应用效率的关键一步。然而，鉴于中医辨证思维的复杂性，获得这种能力对模型来说是一个相当大的挑战。目的：本研究旨在评估法学硕士辨证思维能力，并设计一种有效提高法学硕士辨证思维能力的方法。方法：将中医辨证思维过程分解为病机推理、证候推理和诊断提示三个核心任务。为了评估法学硕士在这些任务中的表现，我们构建了一个高质量的评估数据集，为定量评估法学硕士的能力奠定了可靠的基础。此外，我们开发了一种基于“开卷考试”思想生成指令数据的方法，定制了三个数据模板，并动态检索了与任务相关的专业知识，这些知识被插入模板中的预定义位置。该方法有效生成符合中医辨证思维独特特点的高质量指导数据。利用这些指导数据，对基础模型进行微调，增强法学硕士的辨证思维能力。结果：我们收集了200个医疗案例作为评估数据集，并将其标准化为三种类型的任务问题。我们测试了通用llm和中医llm，并将它们的性能与我们提出的解决方案进行了比较。结果表明，我们的方法显著增强了LLMs的辨证思维。我们的模型在任务1中达到了85.7%，在任务2中达到了81.2%，分别比表现最好的TCM和general llm高出26.3%和15.8%。在Task 3中，我们的模型获得了84.3的相似性分数，这表明该模型与专家给出的建议非常相似。结论：现有的一般法学硕士和中医法学硕士在辨证思维的核心任务上仍然存在明显的局限性。我们的研究表明，通过设计专业的指令模板和生成高质量的指令数据对llm进行微调，可以显著提高llm在核心任务上的表现。优化后的llm在推理结果上具有较高的相似性，与领域专家的意见一致，说明其能够在一定程度上模拟辨证思维。这些发现对于深入解读中医临床诊疗过程的复杂性具有重要的理论和现实意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

JMIR Medical Informatics Medicine-Health Informatics

CiteScore

7.90

自引率

3.10%

发文量

173

审稿时长

12 weeks

期刊介绍： JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.