{"title":"评估老年人轻度认知障碍的大型语言模型:ChatGPT、Gemini和Kimi的双语比较。","authors":"Yexuan Xiao, Qianhui Pan, Haoyuan Liu, Yilin He, Yuhe Zhang, Nan Jiang","doi":"10.1177/14604582251381240","DOIUrl":null,"url":null,"abstract":"<p><p><b>Objective:</b> To evaluate large language models (LLMs) in managing mild cognitive impairment (MCI) and supporting nonspecialist healthcare professionals and care partners, comparing English and Chinese responses. <b>Methods:</b> Seventy-two MCI-related questions were submitted to ChatGPT-4o, Gemini, and Kimi. Responses were assessed for accuracy, comprehensibility, specificity, and actionability using a 5-point Likert scale. Statistical analyses included intraclass correlation coefficients and Mann-Whitney U tests. <b>Results:</b> LLMs performed best in the symptoms and diagnosis domain (<i>M</i> = 4.11 ± 0.15). Healthcare professionals' needs were better met than those of care partners, particularly in comprehensibility and actionability (<i>p</i> < .001). English responses were significantly more comprehensible and specific than Chinese responses (<i>p</i> < .001). <b>Conclusion:</b> This study highlights the potential of LLMs like ChatGPT, Gemini, and Kimi in supporting MCI management, especially in diagnosis and providing actionable insights. However, their performance varied across languages and user groups, with English responses generally more effective than Chinese. The findings emphasize the need for culturally and linguistically adapted LLMs to enhance accuracy and usability. Future research should focus on expanding user diversity, improving adaptability, and incorporating region-specific data to optimize LLMs for MCI care.</p>","PeriodicalId":55069,"journal":{"name":"Health Informatics Journal","volume":"31 3","pages":"14604582251381240"},"PeriodicalIF":2.3000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating large language models for mild cognitive impairment among older adults: A bilingual comparison of ChatGPT, Gemini, and Kimi.\",\"authors\":\"Yexuan Xiao, Qianhui Pan, Haoyuan Liu, Yilin He, Yuhe Zhang, Nan Jiang\",\"doi\":\"10.1177/14604582251381240\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Objective:</b> To evaluate large language models (LLMs) in managing mild cognitive impairment (MCI) and supporting nonspecialist healthcare professionals and care partners, comparing English and Chinese responses. <b>Methods:</b> Seventy-two MCI-related questions were submitted to ChatGPT-4o, Gemini, and Kimi. Responses were assessed for accuracy, comprehensibility, specificity, and actionability using a 5-point Likert scale. Statistical analyses included intraclass correlation coefficients and Mann-Whitney U tests. <b>Results:</b> LLMs performed best in the symptoms and diagnosis domain (<i>M</i> = 4.11 ± 0.15). Healthcare professionals' needs were better met than those of care partners, particularly in comprehensibility and actionability (<i>p</i> < .001). English responses were significantly more comprehensible and specific than Chinese responses (<i>p</i> < .001). <b>Conclusion:</b> This study highlights the potential of LLMs like ChatGPT, Gemini, and Kimi in supporting MCI management, especially in diagnosis and providing actionable insights. However, their performance varied across languages and user groups, with English responses generally more effective than Chinese. The findings emphasize the need for culturally and linguistically adapted LLMs to enhance accuracy and usability. Future research should focus on expanding user diversity, improving adaptability, and incorporating region-specific data to optimize LLMs for MCI care.</p>\",\"PeriodicalId\":55069,\"journal\":{\"name\":\"Health Informatics Journal\",\"volume\":\"31 3\",\"pages\":\"14604582251381240\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Health Informatics Journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/14604582251381240\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/9/16 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Informatics Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/14604582251381240","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/16 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Evaluating large language models for mild cognitive impairment among older adults: A bilingual comparison of ChatGPT, Gemini, and Kimi.
Objective: To evaluate large language models (LLMs) in managing mild cognitive impairment (MCI) and supporting nonspecialist healthcare professionals and care partners, comparing English and Chinese responses. Methods: Seventy-two MCI-related questions were submitted to ChatGPT-4o, Gemini, and Kimi. Responses were assessed for accuracy, comprehensibility, specificity, and actionability using a 5-point Likert scale. Statistical analyses included intraclass correlation coefficients and Mann-Whitney U tests. Results: LLMs performed best in the symptoms and diagnosis domain (M = 4.11 ± 0.15). Healthcare professionals' needs were better met than those of care partners, particularly in comprehensibility and actionability (p < .001). English responses were significantly more comprehensible and specific than Chinese responses (p < .001). Conclusion: This study highlights the potential of LLMs like ChatGPT, Gemini, and Kimi in supporting MCI management, especially in diagnosis and providing actionable insights. However, their performance varied across languages and user groups, with English responses generally more effective than Chinese. The findings emphasize the need for culturally and linguistically adapted LLMs to enhance accuracy and usability. Future research should focus on expanding user diversity, improving adaptability, and incorporating region-specific data to optimize LLMs for MCI care.
期刊介绍:
Health Informatics Journal is an international peer-reviewed journal. All papers submitted to Health Informatics Journal are subject to peer review by members of a carefully appointed editorial board. The journal operates a conventional single-blind reviewing policy in which the reviewer’s name is always concealed from the submitting author.