评估患者教育中的人工智能：DeepSeek-V3与chatgpt - 40在回答腹腔镜胆囊切除术中的常见问题

IF 1.6 4区医学 Q3 SURGERY

ANZ Journal of Surgery Pub Date : 2025-06-11 DOI:10.1111/ans.70198

Hilmi Anil Dincer, Dogukan Dogu

{"title":"评估患者教育中的人工智能：DeepSeek-V3与chatgpt - 40在回答腹腔镜胆囊切除术中的常见问题","authors":"Hilmi Anil Dincer, Dogukan Dogu","doi":"10.1111/ans.70198","DOIUrl":null,"url":null,"abstract":"Background: Artificial intelligence-based large language models (AI-based LLMs) have gained popularity over traditional search engines for obtaining medical information. However, the accuracy and reliability of these AI-generated medical insights remain a topic of debate. Recently, a new AI-based LLM, DeepSeek-V3, developed in East Asia, has been introduced. The aim of this study is to evaluate the appropriateness, accuracy, and readability of responses and the usability of these answers for patient education provided by ChatGPT-4o and DeepSeek-V3 AI-based LLMs to frequently asked questions by patients regarding laparoscopic cholecystectomy (LC).Methods: The 20 most frequently asked questions by patients regarding LC were presented to the DeepSeek-V3 and ChatGPT-4o chatbots. Before each question, the search history was deleted. The comprehensiveness of the responses was evaluated based on clinical experience by two board-certified general surgeons experienced in hepatobiliary surgery using a Likert scale. Paired sample t-test and Wilcoxon signed rank test were used. Inter-rater reliability was analyzed with Cohen's Kappa test.Results: The DeepSeek-V3 chatbot provided statistically significantly more suitable responses compared to ChatGPT-4o (p = 0.033). On the Likert scale, DeepSeek-V3 received a 5-point rating for 19 out of 20 questions (95%), whereas ChatGPT-4o achieved a 5-point rating for only 13 questions (65%). Based on the evaluation conducted according to the reviewers' clinical experience, DeepSeek-V3 provided statistically significantly more appropriate responses (p = 0.008).Conclusion: Released in January 2025, DeepSeek-V3 provides more suitable responses to patient inquiries regarding LC compared to ChatGPT-4o.","PeriodicalId":8158,"journal":{"name":"ANZ Journal of Surgery","volume":" ","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating Artificial Intelligence in Patient Education: DeepSeek-V3 Versus ChatGPT-4o in Answering Common Questions on Laparoscopic Cholecystectomy.\",\"authors\":\"Hilmi Anil Dincer, Dogukan Dogu\",\"doi\":\"10.1111/ans.70198\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Artificial intelligence-based large language models (AI-based LLMs) have gained popularity over traditional search engines for obtaining medical information. However, the accuracy and reliability of these AI-generated medical insights remain a topic of debate. Recently, a new AI-based LLM, DeepSeek-V3, developed in East Asia, has been introduced. The aim of this study is to evaluate the appropriateness, accuracy, and readability of responses and the usability of these answers for patient education provided by ChatGPT-4o and DeepSeek-V3 AI-based LLMs to frequently asked questions by patients regarding laparoscopic cholecystectomy (LC).Methods: The 20 most frequently asked questions by patients regarding LC were presented to the DeepSeek-V3 and ChatGPT-4o chatbots. Before each question, the search history was deleted. The comprehensiveness of the responses was evaluated based on clinical experience by two board-certified general surgeons experienced in hepatobiliary surgery using a Likert scale. Paired sample t-test and Wilcoxon signed rank test were used. Inter-rater reliability was analyzed with Cohen's Kappa test.Results: The DeepSeek-V3 chatbot provided statistically significantly more suitable responses compared to ChatGPT-4o (p = 0.033). On the Likert scale, DeepSeek-V3 received a 5-point rating for 19 out of 20 questions (95%), whereas ChatGPT-4o achieved a 5-point rating for only 13 questions (65%). Based on the evaluation conducted according to the reviewers' clinical experience, DeepSeek-V3 provided statistically significantly more appropriate responses (p = 0.008).Conclusion: Released in January 2025, DeepSeek-V3 provides more suitable responses to patient inquiries regarding LC compared to ChatGPT-4o.\",\"PeriodicalId\":8158,\"journal\":{\"name\":\"ANZ Journal of Surgery\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ANZ Journal of Surgery\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1111/ans.70198\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ANZ Journal of Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/ans.70198","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}

引用次数: 0

摘要

背景：基于人工智能的大型语言模型（AI-based LLMs）在获取医学信息方面已经超过了传统的搜索引擎。然而，这些人工智能产生的医学见解的准确性和可靠性仍然是一个有争议的话题。最近，在东亚开发的一种新的基于人工智能的法学硕士DeepSeek-V3已经推出。本研究的目的是评估chatgpt - 40和DeepSeek-V3基于人工智能的法学硕士为患者提供的关于腹腔镜胆囊切除术（LC）的常见问题的回答的适当性、准确性和可读性，以及这些答案的可用性。方法：向DeepSeek-V3和chatgpt - 40聊天机器人提交患者关于LC的20个最常见问题。在每个问题之前，搜索历史被删除。根据两名在肝胆手术方面经验丰富的委员会认证的普通外科医生的临床经验，使用李克特量表对反应的全面性进行评估。采用配对样本t检验和Wilcoxon符号秩检验。评估者间信度采用Cohen’s Kappa检验。结果：与chatgpt - 40相比，DeepSeek-V3聊天机器人提供了具有统计学意义的更合适的响应（p = 0.033）。在李克特量表上，DeepSeek-V3在20个问题中的19个（95%）获得了5分评级，而chatgpt - 40只有13个问题（65%）获得了5分评级。根据审稿人的临床经验进行评价，DeepSeek-V3提供了具有统计学意义的更合适的反应（p = 0.008）。结论：与chatgpt - 40相比，DeepSeek-V3于2025年1月发布，对患者关于LC的询问提供了更合适的响应。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluating Artificial Intelligence in Patient Education: DeepSeek-V3 Versus ChatGPT-4o in Answering Common Questions on Laparoscopic Cholecystectomy.

Background: Artificial intelligence-based large language models (AI-based LLMs) have gained popularity over traditional search engines for obtaining medical information. However, the accuracy and reliability of these AI-generated medical insights remain a topic of debate. Recently, a new AI-based LLM, DeepSeek-V3, developed in East Asia, has been introduced. The aim of this study is to evaluate the appropriateness, accuracy, and readability of responses and the usability of these answers for patient education provided by ChatGPT-4o and DeepSeek-V3 AI-based LLMs to frequently asked questions by patients regarding laparoscopic cholecystectomy (LC).

Methods: The 20 most frequently asked questions by patients regarding LC were presented to the DeepSeek-V3 and ChatGPT-4o chatbots. Before each question, the search history was deleted. The comprehensiveness of the responses was evaluated based on clinical experience by two board-certified general surgeons experienced in hepatobiliary surgery using a Likert scale. Paired sample t-test and Wilcoxon signed rank test were used. Inter-rater reliability was analyzed with Cohen's Kappa test.

Results: The DeepSeek-V3 chatbot provided statistically significantly more suitable responses compared to ChatGPT-4o (p = 0.033). On the Likert scale, DeepSeek-V3 received a 5-point rating for 19 out of 20 questions (95%), whereas ChatGPT-4o achieved a 5-point rating for only 13 questions (65%). Based on the evaluation conducted according to the reviewers' clinical experience, DeepSeek-V3 provided statistically significantly more appropriate responses (p = 0.008).

Conclusion: Released in January 2025, DeepSeek-V3 provides more suitable responses to patient inquiries regarding LC compared to ChatGPT-4o.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ANZ Journal of Surgery 医学-外科

CiteScore

2.50

自引率

11.80%

发文量

720

审稿时长

2 months

期刊介绍： ANZ Journal of Surgery is published by Wiley on behalf of the Royal Australasian College of Surgeons to provide a medium for the publication of peer-reviewed original contributions related to clinical practice and/or research in all fields of surgery and related disciplines. It also provides a programme of continuing education for surgeons. All articles are peer-reviewed by at least two researchers expert in the field of the submitted paper.