The Performance of ChatGPT-4.0 and ChatGPT-4omni on Answering Thyroid Question: A Multicenter Study

IF 1.7 3区医学 Q2 SURGERY

Journal of Surgical Research Pub Date : 2025-07-23 DOI:10.1016/j.jss.2025.06.066

Siyin Guo BS , Genpeng Li MD , Juxiang Gou BS , Yanping Gong MD , Wanjun Zhao MD , Zhiqiang Li MS , Xianwei Yang MD , Zhenni Liu MS , Zhihui Li MD , Jianyong Lei MD

{"title":"The Performance of ChatGPT-4.0 and ChatGPT-4omni on Answering Thyroid Question: A Multicenter Study","authors":"Siyin Guo BS , Genpeng Li MD , Juxiang Gou BS , Yanping Gong MD , Wanjun Zhao MD , Zhiqiang Li MS , Xianwei Yang MD , Zhenni Liu MS , Zhihui Li MD , Jianyong Lei MD","doi":"10.1016/j.jss.2025.06.066","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Although ChatGPT-4.0 exhibits increasing potential in medical applications, its more recent version, ChatGPT-4omni, has not yet been evaluated for how well it responds to patient questions on thyroid health. In this study, the performance of ChatGPT-4.0 and ChatGPT-4omni in answering questions on the thyroid was examined.</div></div><div><h3>Methods</h3><div>To test the performance of ChatGPT-4.0 and ChatGPT-4omni, we first obtained 28 thyroid-related questions from the Huayitong app, a convenient medical app that was officially released by West China Hospital of Sichuan University. We also added two interventional questions to the total of 30 questions. On June 28, 2024, we entered these queries into ChatGPT-4.0 and ChatGPT-4omni in Chinese to generate 60 Chinese replies. Finally, from July 1 to 15, 2024, we asked 60 patients, 29 surgeons, and 37 nurses from 21 tertiary care units nationwide to rate the two sources’ responses on a 5-point Likert scale in terms of time, word count, response speed, accuracy, comprehensiveness, empathy, and satisfaction.</div></div><div><h3>Results</h3><div>When answering 30 questions, ChatGPT-4omni answered more words (437.30 [110.20] characters <em>versus</em> 750.50 [611.50-817.25] characters; <em>P</em> < 0.001), took less time to respond (27.58 [7.22] seconds <em>versus</em> 20.68 [4.38] seconds; <em>P</em> < 0.001), and was faster (15.69 [13.90–16.92]) character/second <em>versus</em> 34.26 [5.03] character/second; <em>P</em> < 0.001) than ChatGPT-4.0. Responses from ChatGPT-4omni were rated as more accurate, comprehensive, sympathetic, and satisfied than those from ChatGPT-4.0 by patients, thyroid surgeons, and thyroid surgery nurses (all <em>P</em> values < 0.05).</div></div><div><h3>Conclusions</h3><div>ChatGPT-4omni outperformed ChatGPT-4.0 in answering common thyroid-related questions. However, further study and optimization are needed to achieve an efficient integration of ChatGPT in clinical settings.</div></div>","PeriodicalId":17030,"journal":{"name":"Journal of Surgical Research","volume":"313 ","pages":"Pages 500-508"},"PeriodicalIF":1.7000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Surgical Research","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022480425004044","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SURGERY","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction

Although ChatGPT-4.0 exhibits increasing potential in medical applications, its more recent version, ChatGPT-4omni, has not yet been evaluated for how well it responds to patient questions on thyroid health. In this study, the performance of ChatGPT-4.0 and ChatGPT-4omni in answering questions on the thyroid was examined.

Methods

To test the performance of ChatGPT-4.0 and ChatGPT-4omni, we first obtained 28 thyroid-related questions from the Huayitong app, a convenient medical app that was officially released by West China Hospital of Sichuan University. We also added two interventional questions to the total of 30 questions. On June 28, 2024, we entered these queries into ChatGPT-4.0 and ChatGPT-4omni in Chinese to generate 60 Chinese replies. Finally, from July 1 to 15, 2024, we asked 60 patients, 29 surgeons, and 37 nurses from 21 tertiary care units nationwide to rate the two sources’ responses on a 5-point Likert scale in terms of time, word count, response speed, accuracy, comprehensiveness, empathy, and satisfaction.

Results

When answering 30 questions, ChatGPT-4omni answered more words (437.30 [110.20] characters versus 750.50 [611.50-817.25] characters; P < 0.001), took less time to respond (27.58 [7.22] seconds versus 20.68 [4.38] seconds; P < 0.001), and was faster (15.69 [13.90–16.92]) character/second versus 34.26 [5.03] character/second; P < 0.001) than ChatGPT-4.0. Responses from ChatGPT-4omni were rated as more accurate, comprehensive, sympathetic, and satisfied than those from ChatGPT-4.0 by patients, thyroid surgeons, and thyroid surgery nurses (all P values < 0.05).

Conclusions

ChatGPT-4omni outperformed ChatGPT-4.0 in answering common thyroid-related questions. However, further study and optimization are needed to achieve an efficient integration of ChatGPT in clinical settings.

查看原文本刊更多论文

ChatGPT-4.0和ChatGPT-4omni在回答甲状腺问题上的表现：一项多中心研究

虽然ChatGPT-4.0在医疗应用中显示出越来越大的潜力，但其最新版本ChatGPT-4omni尚未对其对患者甲状腺健康问题的反应进行评估。在本研究中，ChatGPT-4.0和ChatGPT-4omni在回答甲状腺问题方面的性能进行了测试。方法为了测试ChatGPT-4.0和ChatGPT-4omni的性能，我们首先从四川大学华西医院官方发布的便捷医疗app华通app中获取28个甲状腺相关问题。我们还在总共30个问题中增加了两个干预性问题。在2024年6月28日，我们将这些问题输入到ChatGPT-4.0和ChatGPT-4omni中，生成60个中文回复。最后，从2024年7月1日至15日，我们询问了来自全国21个三级医疗单位的60名患者、29名外科医生和37名护士，以5分李克特量表对两个来源的回答进行评分，包括时间、字数、反应速度、准确性、全面性、同理心和满意度。结果在回答30个问题时，ChatGPT-4omni回答的单词数比前者多(437.30[110.20]个字符，后者为750.50[611.50-817.25]个字符；P & lt;0.001)，反应时间更短(27.58[7.22]秒vs . 20.68[4.38]秒；P & lt;0.001)，比（15.69[13.90-16.92]）字/秒比（34.26[5.03]字/秒）更快；P & lt;0.001)比ChatGPT-4.0。患者、甲状腺外科医生和甲状腺外科护士认为ChatGPT-4omni的回答比ChatGPT-4.0的回答更准确、更全面、更有同情心和更满意(P值均为<；0.05)。结论chatgpt -4omni在回答甲状腺常见问题方面优于ChatGPT-4.0。然而，为了在临床环境中实现ChatGPT的有效整合，还需要进一步的研究和优化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Surgical Research 医学-外科

CiteScore

3.90

自引率

4.50%

发文量

627

审稿时长

138 days

期刊介绍： The Journal of Surgical Research: Clinical and Laboratory Investigation publishes original articles concerned with clinical and laboratory investigations relevant to surgical practice and teaching. The journal emphasizes reports of clinical investigations or fundamental research bearing directly on surgical management that will be of general interest to a broad range of surgeons and surgical researchers. The articles presented need not have been the products of surgeons or of surgical laboratories. The Journal of Surgical Research also features review articles and special articles relating to educational, research, or social issues of interest to the academic surgical community.