{"title":"评估患者教育中的人工智能:DeepSeek-V3与chatgpt - 40在回答腹腔镜胆囊切除术中的常见问题","authors":"Hilmi Anil Dincer, Dogukan Dogu","doi":"10.1111/ans.70198","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence-based large language models (AI-based LLMs) have gained popularity over traditional search engines for obtaining medical information. However, the accuracy and reliability of these AI-generated medical insights remain a topic of debate. Recently, a new AI-based LLM, DeepSeek-V3, developed in East Asia, has been introduced. The aim of this study is to evaluate the appropriateness, accuracy, and readability of responses and the usability of these answers for patient education provided by ChatGPT-4o and DeepSeek-V3 AI-based LLMs to frequently asked questions by patients regarding laparoscopic cholecystectomy (LC).</p><p><strong>Methods: </strong>The 20 most frequently asked questions by patients regarding LC were presented to the DeepSeek-V3 and ChatGPT-4o chatbots. Before each question, the search history was deleted. The comprehensiveness of the responses was evaluated based on clinical experience by two board-certified general surgeons experienced in hepatobiliary surgery using a Likert scale. Paired sample t-test and Wilcoxon signed rank test were used. Inter-rater reliability was analyzed with Cohen's Kappa test.</p><p><strong>Results: </strong>The DeepSeek-V3 chatbot provided statistically significantly more suitable responses compared to ChatGPT-4o (p = 0.033). On the Likert scale, DeepSeek-V3 received a 5-point rating for 19 out of 20 questions (95%), whereas ChatGPT-4o achieved a 5-point rating for only 13 questions (65%). Based on the evaluation conducted according to the reviewers' clinical experience, DeepSeek-V3 provided statistically significantly more appropriate responses (p = 0.008).</p><p><strong>Conclusion: </strong>Released in January 2025, DeepSeek-V3 provides more suitable responses to patient inquiries regarding LC compared to ChatGPT-4o.</p>","PeriodicalId":8158,"journal":{"name":"ANZ Journal of Surgery","volume":" ","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating Artificial Intelligence in Patient Education: DeepSeek-V3 Versus ChatGPT-4o in Answering Common Questions on Laparoscopic Cholecystectomy.\",\"authors\":\"Hilmi Anil Dincer, Dogukan Dogu\",\"doi\":\"10.1111/ans.70198\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Artificial intelligence-based large language models (AI-based LLMs) have gained popularity over traditional search engines for obtaining medical information. However, the accuracy and reliability of these AI-generated medical insights remain a topic of debate. Recently, a new AI-based LLM, DeepSeek-V3, developed in East Asia, has been introduced. The aim of this study is to evaluate the appropriateness, accuracy, and readability of responses and the usability of these answers for patient education provided by ChatGPT-4o and DeepSeek-V3 AI-based LLMs to frequently asked questions by patients regarding laparoscopic cholecystectomy (LC).</p><p><strong>Methods: </strong>The 20 most frequently asked questions by patients regarding LC were presented to the DeepSeek-V3 and ChatGPT-4o chatbots. Before each question, the search history was deleted. The comprehensiveness of the responses was evaluated based on clinical experience by two board-certified general surgeons experienced in hepatobiliary surgery using a Likert scale. Paired sample t-test and Wilcoxon signed rank test were used. Inter-rater reliability was analyzed with Cohen's Kappa test.</p><p><strong>Results: </strong>The DeepSeek-V3 chatbot provided statistically significantly more suitable responses compared to ChatGPT-4o (p = 0.033). On the Likert scale, DeepSeek-V3 received a 5-point rating for 19 out of 20 questions (95%), whereas ChatGPT-4o achieved a 5-point rating for only 13 questions (65%). Based on the evaluation conducted according to the reviewers' clinical experience, DeepSeek-V3 provided statistically significantly more appropriate responses (p = 0.008).</p><p><strong>Conclusion: </strong>Released in January 2025, DeepSeek-V3 provides more suitable responses to patient inquiries regarding LC compared to ChatGPT-4o.</p>\",\"PeriodicalId\":8158,\"journal\":{\"name\":\"ANZ Journal of Surgery\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ANZ Journal of Surgery\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1111/ans.70198\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ANZ Journal of Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/ans.70198","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}
Evaluating Artificial Intelligence in Patient Education: DeepSeek-V3 Versus ChatGPT-4o in Answering Common Questions on Laparoscopic Cholecystectomy.
Background: Artificial intelligence-based large language models (AI-based LLMs) have gained popularity over traditional search engines for obtaining medical information. However, the accuracy and reliability of these AI-generated medical insights remain a topic of debate. Recently, a new AI-based LLM, DeepSeek-V3, developed in East Asia, has been introduced. The aim of this study is to evaluate the appropriateness, accuracy, and readability of responses and the usability of these answers for patient education provided by ChatGPT-4o and DeepSeek-V3 AI-based LLMs to frequently asked questions by patients regarding laparoscopic cholecystectomy (LC).
Methods: The 20 most frequently asked questions by patients regarding LC were presented to the DeepSeek-V3 and ChatGPT-4o chatbots. Before each question, the search history was deleted. The comprehensiveness of the responses was evaluated based on clinical experience by two board-certified general surgeons experienced in hepatobiliary surgery using a Likert scale. Paired sample t-test and Wilcoxon signed rank test were used. Inter-rater reliability was analyzed with Cohen's Kappa test.
Results: The DeepSeek-V3 chatbot provided statistically significantly more suitable responses compared to ChatGPT-4o (p = 0.033). On the Likert scale, DeepSeek-V3 received a 5-point rating for 19 out of 20 questions (95%), whereas ChatGPT-4o achieved a 5-point rating for only 13 questions (65%). Based on the evaluation conducted according to the reviewers' clinical experience, DeepSeek-V3 provided statistically significantly more appropriate responses (p = 0.008).
Conclusion: Released in January 2025, DeepSeek-V3 provides more suitable responses to patient inquiries regarding LC compared to ChatGPT-4o.
期刊介绍:
ANZ Journal of Surgery is published by Wiley on behalf of the Royal Australasian College of Surgeons to provide a medium for the publication of peer-reviewed original contributions related to clinical practice and/or research in all fields of surgery and related disciplines. It also provides a programme of continuing education for surgeons. All articles are peer-reviewed by at least two researchers expert in the field of the submitted paper.