ChatGPT与DeepSeek大语言模型在双语个体哮喘教育中的作用评估:比较研究。

IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS
Yaxin Liu, Fangfei Yu, Xiaofei Zhang, Xiaohan Tong, Kui Li, Weikuan Gu, Baiquan Yu
{"title":"ChatGPT与DeepSeek大语言模型在双语个体哮喘教育中的作用评估:比较研究。","authors":"Yaxin Liu, Fangfei Yu, Xiaofei Zhang, Xiaohan Tong, Kui Li, Weikuan Gu, Baiquan Yu","doi":"10.2196/65365","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Asthma is a chronic inflammatory airway disease requiring long-term management. Artificial intelligence (AI)-driven tools such as large language models (LLMs) hold potential for enhancing patient education, especially for multilingual populations. However, comparative assessments of LLMs in disease-specific, bilingual health communication are limited.</p><p><strong>Objective: </strong>This study aimed to evaluate and compare the performance of two advanced LLMs-ChatGPT-4o (OpenAI) and DeepSeek-v3 (DeepSeek AI)-in providing bilingual (English and Chinese) education for patients with asthma, focusing on accuracy, completeness, clinical relevance, and language adaptability.</p><p><strong>Methods: </strong>A total of 53 asthma-related questions were collected from real patient inquiries across 8 clinical domains. Each question was posed in both English and Chinese to ChatGPT-4o and DeepSeek-v3. Responses were evaluated using a 7D clinical quality framework (eg, completeness, consensus consistency, and reasoning ability) adapted from Google Health. Three respiratory clinicians performed blinded scoring evaluations. Descriptive statistics and Wilcoxon signed-rank tests were applied to compare performance across domains and against theoretical maximums.</p><p><strong>Results: </strong>Both models demonstrated high overall quality in generating bilingual educational content. DeepSeek-v3 outperformed ChatGPT-4o in completeness and currency, particularly in treatment-related knowledge and symptom interpretation. ChatGPT-4o showed advantages in clarity and accessibility. In English responses, ChatGPT achieved perfect scores across 5 domains, but scored lower in clinical features (mean 3.78, SD 0.16; P=.02), treatment (mean 3.90, SD 0.05; P=.03), and differential diagnosis (mean 3.83, SD 0.29; P=.08).</p><p><strong>Conclusions: </strong>ChatGPT-4o and DeepSeek-v3 each offer distinct strengths for bilingual asthma education. While ChatGPT is more suitable for general health education due to its expressive clarity, DeepSeek provides more up-to-date and comprehensive clinical content. Both models can serve as effective supplementary tools for patient self-management but cannot replace professional medical advice. Future AI health care systems should enhance clinical reasoning, ensure guideline currency, and integrate human oversight to optimize safety and accuracy.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e65365"},"PeriodicalIF":3.8000,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12349887/pdf/","citationCount":"0","resultStr":"{\"title\":\"Assessing the Role of Large Language Models Between ChatGPT and DeepSeek in Asthma Education for Bilingual Individuals: Comparative Study.\",\"authors\":\"Yaxin Liu, Fangfei Yu, Xiaofei Zhang, Xiaohan Tong, Kui Li, Weikuan Gu, Baiquan Yu\",\"doi\":\"10.2196/65365\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Asthma is a chronic inflammatory airway disease requiring long-term management. Artificial intelligence (AI)-driven tools such as large language models (LLMs) hold potential for enhancing patient education, especially for multilingual populations. However, comparative assessments of LLMs in disease-specific, bilingual health communication are limited.</p><p><strong>Objective: </strong>This study aimed to evaluate and compare the performance of two advanced LLMs-ChatGPT-4o (OpenAI) and DeepSeek-v3 (DeepSeek AI)-in providing bilingual (English and Chinese) education for patients with asthma, focusing on accuracy, completeness, clinical relevance, and language adaptability.</p><p><strong>Methods: </strong>A total of 53 asthma-related questions were collected from real patient inquiries across 8 clinical domains. Each question was posed in both English and Chinese to ChatGPT-4o and DeepSeek-v3. Responses were evaluated using a 7D clinical quality framework (eg, completeness, consensus consistency, and reasoning ability) adapted from Google Health. Three respiratory clinicians performed blinded scoring evaluations. Descriptive statistics and Wilcoxon signed-rank tests were applied to compare performance across domains and against theoretical maximums.</p><p><strong>Results: </strong>Both models demonstrated high overall quality in generating bilingual educational content. DeepSeek-v3 outperformed ChatGPT-4o in completeness and currency, particularly in treatment-related knowledge and symptom interpretation. ChatGPT-4o showed advantages in clarity and accessibility. In English responses, ChatGPT achieved perfect scores across 5 domains, but scored lower in clinical features (mean 3.78, SD 0.16; P=.02), treatment (mean 3.90, SD 0.05; P=.03), and differential diagnosis (mean 3.83, SD 0.29; P=.08).</p><p><strong>Conclusions: </strong>ChatGPT-4o and DeepSeek-v3 each offer distinct strengths for bilingual asthma education. While ChatGPT is more suitable for general health education due to its expressive clarity, DeepSeek provides more up-to-date and comprehensive clinical content. Both models can serve as effective supplementary tools for patient self-management but cannot replace professional medical advice. Future AI health care systems should enhance clinical reasoning, ensure guideline currency, and integrate human oversight to optimize safety and accuracy.</p>\",\"PeriodicalId\":56334,\"journal\":{\"name\":\"JMIR Medical Informatics\",\"volume\":\"13 \",\"pages\":\"e65365\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12349887/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2196/65365\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/65365","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

摘要

背景:哮喘是一种需要长期治疗的慢性炎性气道疾病。人工智能(AI)驱动的工具,如大型语言模型(llm),具有加强患者教育的潜力,特别是对于多语言人群。然而,法学硕士在疾病特异性、双语健康沟通方面的比较评估是有限的。目的:本研究旨在评估和比较两个先进的llms - chatgpt - 40 (OpenAI)和DeepSeek-v3 (DeepSeek AI)在为哮喘患者提供双语(中英文)教育方面的表现,重点关注准确性、完整性、临床相关性和语言适应性。方法:从8个临床领域的真实患者询问中收集53个哮喘相关问题。每个问题都是用中英文向chatgpt - 40和DeepSeek-v3提出的。采用改编自谷歌Health的7D临床质量框架(如完整性、共识一致性和推理能力)对反应进行评估。三名呼吸内科医生进行了盲法评分评估。应用描述性统计和Wilcoxon符号秩检验来比较跨域和理论最大值的性能。结果:两种模式在生成双语教育内容时均表现出较高的整体质量。DeepSeek-v3在完整性和通用性方面优于chatgpt - 40,特别是在治疗相关知识和症状解释方面。chatgpt - 40在清晰度和可访问性方面具有优势。在英语回复中,ChatGPT在5个领域都获得了满分,但在临床特征方面得分较低(平均值3.78,标准差0.16;P=.02),治疗组(平均3.90,SD 0.05;P=.03),鉴别诊断(平均3.83,标准差0.29;P =。08)。结论:chatgpt - 40和DeepSeek-v3在双语哮喘教育中各有优势。ChatGPT表达清晰,更适合普通健康教育,而DeepSeek提供的临床内容更最新、更全面。这两种模式都可以作为患者自我管理的有效补充工具,但不能取代专业的医疗建议。未来的人工智能医疗系统应该加强临床推理,确保指导货币,并整合人为监督,以优化安全性和准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Assessing the Role of Large Language Models Between ChatGPT and DeepSeek in Asthma Education for Bilingual Individuals: Comparative Study.

Background: Asthma is a chronic inflammatory airway disease requiring long-term management. Artificial intelligence (AI)-driven tools such as large language models (LLMs) hold potential for enhancing patient education, especially for multilingual populations. However, comparative assessments of LLMs in disease-specific, bilingual health communication are limited.

Objective: This study aimed to evaluate and compare the performance of two advanced LLMs-ChatGPT-4o (OpenAI) and DeepSeek-v3 (DeepSeek AI)-in providing bilingual (English and Chinese) education for patients with asthma, focusing on accuracy, completeness, clinical relevance, and language adaptability.

Methods: A total of 53 asthma-related questions were collected from real patient inquiries across 8 clinical domains. Each question was posed in both English and Chinese to ChatGPT-4o and DeepSeek-v3. Responses were evaluated using a 7D clinical quality framework (eg, completeness, consensus consistency, and reasoning ability) adapted from Google Health. Three respiratory clinicians performed blinded scoring evaluations. Descriptive statistics and Wilcoxon signed-rank tests were applied to compare performance across domains and against theoretical maximums.

Results: Both models demonstrated high overall quality in generating bilingual educational content. DeepSeek-v3 outperformed ChatGPT-4o in completeness and currency, particularly in treatment-related knowledge and symptom interpretation. ChatGPT-4o showed advantages in clarity and accessibility. In English responses, ChatGPT achieved perfect scores across 5 domains, but scored lower in clinical features (mean 3.78, SD 0.16; P=.02), treatment (mean 3.90, SD 0.05; P=.03), and differential diagnosis (mean 3.83, SD 0.29; P=.08).

Conclusions: ChatGPT-4o and DeepSeek-v3 each offer distinct strengths for bilingual asthma education. While ChatGPT is more suitable for general health education due to its expressive clarity, DeepSeek provides more up-to-date and comprehensive clinical content. Both models can serve as effective supplementary tools for patient self-management but cannot replace professional medical advice. Future AI health care systems should enhance clinical reasoning, ensure guideline currency, and integrate human oversight to optimize safety and accuracy.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
JMIR Medical Informatics
JMIR Medical Informatics Medicine-Health Informatics
CiteScore
7.90
自引率
3.10%
发文量
173
审稿时长
12 weeks
期刊介绍: JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信