评估老年人轻度认知障碍的大型语言模型：ChatGPT、Gemini和Kimi的双语比较。

IF 2.3 3区医学 Q2 HEALTH CARE SCIENCES & SERVICES

Health Informatics Journal Pub Date : 2025-07-01 Epub Date: 2025-09-16 DOI:10.1177/14604582251381240

Yexuan Xiao, Qianhui Pan, Haoyuan Liu, Yilin He, Yuhe Zhang, Nan Jiang

{"title":"评估老年人轻度认知障碍的大型语言模型：ChatGPT、Gemini和Kimi的双语比较。","authors":"Yexuan Xiao, Qianhui Pan, Haoyuan Liu, Yilin He, Yuhe Zhang, Nan Jiang","doi":"10.1177/14604582251381240","DOIUrl":null,"url":null,"abstract":"Objective: To evaluate large language models (LLMs) in managing mild cognitive impairment (MCI) and supporting nonspecialist healthcare professionals and care partners, comparing English and Chinese responses. Methods: Seventy-two MCI-related questions were submitted to ChatGPT-4o, Gemini, and Kimi. Responses were assessed for accuracy, comprehensibility, specificity, and actionability using a 5-point Likert scale. Statistical analyses included intraclass correlation coefficients and Mann-Whitney U tests. Results: LLMs performed best in the symptoms and diagnosis domain (M = 4.11 ± 0.15). Healthcare professionals' needs were better met than those of care partners, particularly in comprehensibility and actionability (p < .001). English responses were significantly more comprehensible and specific than Chinese responses (p < .001). Conclusion: This study highlights the potential of LLMs like ChatGPT, Gemini, and Kimi in supporting MCI management, especially in diagnosis and providing actionable insights. However, their performance varied across languages and user groups, with English responses generally more effective than Chinese. The findings emphasize the need for culturally and linguistically adapted LLMs to enhance accuracy and usability. Future research should focus on expanding user diversity, improving adaptability, and incorporating region-specific data to optimize LLMs for MCI care.","PeriodicalId":55069,"journal":{"name":"Health Informatics Journal","volume":"31 3","pages":"14604582251381240"},"PeriodicalIF":2.3000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating large language models for mild cognitive impairment among older adults: A bilingual comparison of ChatGPT, Gemini, and Kimi.\",\"authors\":\"Yexuan Xiao, Qianhui Pan, Haoyuan Liu, Yilin He, Yuhe Zhang, Nan Jiang\",\"doi\":\"10.1177/14604582251381240\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: To evaluate large language models (LLMs) in managing mild cognitive impairment (MCI) and supporting nonspecialist healthcare professionals and care partners, comparing English and Chinese responses. Methods: Seventy-two MCI-related questions were submitted to ChatGPT-4o, Gemini, and Kimi. Responses were assessed for accuracy, comprehensibility, specificity, and actionability using a 5-point Likert scale. Statistical analyses included intraclass correlation coefficients and Mann-Whitney U tests. Results: LLMs performed best in the symptoms and diagnosis domain (M = 4.11 ± 0.15). Healthcare professionals' needs were better met than those of care partners, particularly in comprehensibility and actionability (p < .001). English responses were significantly more comprehensible and specific than Chinese responses (p < .001). Conclusion: This study highlights the potential of LLMs like ChatGPT, Gemini, and Kimi in supporting MCI management, especially in diagnosis and providing actionable insights. However, their performance varied across languages and user groups, with English responses generally more effective than Chinese. The findings emphasize the need for culturally and linguistically adapted LLMs to enhance accuracy and usability. Future research should focus on expanding user diversity, improving adaptability, and incorporating region-specific data to optimize LLMs for MCI care.\",\"PeriodicalId\":55069,\"journal\":{\"name\":\"Health Informatics Journal\",\"volume\":\"31 3\",\"pages\":\"14604582251381240\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Health Informatics Journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/14604582251381240\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/9/16 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Informatics Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/14604582251381240","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/16 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

目的：评价大语言模型（LLMs）在轻度认知障碍（MCI）管理和支持非专科医疗保健专业人员和护理伙伴中的应用，比较中英文反应。方法：将72个mci相关问题提交给chatgpt - 40、Gemini和Kimi。采用李克特5分量表评估反应的准确性、可理解性、特异性和可操作性。统计分析包括类内相关系数和Mann-Whitney U检验。结果：LLMs在症状和诊断领域表现最佳（M = 4.11±0.15）。医疗保健专业人员的需求比护理伙伴的需求得到更好的满足，特别是在可理解性和可操作性方面（p < 0.001）。英语回答明显比汉语回答更容易理解和具体（p < 0.001）。结论：本研究突出了ChatGPT、Gemini和Kimi等llm在支持MCI管理方面的潜力，特别是在诊断和提供可操作的见解方面。然而，他们的表现因语言和用户群体而异，英语回复通常比中文更有效。研究结果强调需要适应文化和语言的法学硕士来提高准确性和可用性。未来的研究应侧重于扩大用户多样性，提高适应性，并结合区域特定数据来优化llm用于MCI护理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluating large language models for mild cognitive impairment among older adults: A bilingual comparison of ChatGPT, Gemini, and Kimi.

Objective: To evaluate large language models (LLMs) in managing mild cognitive impairment (MCI) and supporting nonspecialist healthcare professionals and care partners, comparing English and Chinese responses. Methods: Seventy-two MCI-related questions were submitted to ChatGPT-4o, Gemini, and Kimi. Responses were assessed for accuracy, comprehensibility, specificity, and actionability using a 5-point Likert scale. Statistical analyses included intraclass correlation coefficients and Mann-Whitney U tests. Results: LLMs performed best in the symptoms and diagnosis domain (M = 4.11 ± 0.15). Healthcare professionals' needs were better met than those of care partners, particularly in comprehensibility and actionability (p < .001). English responses were significantly more comprehensible and specific than Chinese responses (p < .001). Conclusion: This study highlights the potential of LLMs like ChatGPT, Gemini, and Kimi in supporting MCI management, especially in diagnosis and providing actionable insights. However, their performance varied across languages and user groups, with English responses generally more effective than Chinese. The findings emphasize the need for culturally and linguistically adapted LLMs to enhance accuracy and usability. Future research should focus on expanding user diversity, improving adaptability, and incorporating region-specific data to optimize LLMs for MCI care.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Health Informatics Journal HEALTH CARE SCIENCES & SERVICES-MEDICAL INFORMATICS

CiteScore

7.80

自引率

6.70%

发文量

审稿时长

6 months

期刊介绍： Health Informatics Journal is an international peer-reviewed journal. All papers submitted to Health Informatics Journal are subject to peer review by members of a carefully appointed editorial board. The journal operates a conventional single-blind reviewing policy in which the reviewer’s name is always concealed from the submitting author.