ChatGPT's performance in dentistry and allergy-immunology assessments: a comparative study.

Q3 Medicine
Swiss dental journal Pub Date : 2023-10-06
Alexander Fuchs, Tina Trachsel, Roland Weiger, Florin Eggmann
{"title":"ChatGPT's performance in dentistry and allergy-immunology assessments: a comparative study.","authors":"Alexander Fuchs,&nbsp;Tina Trachsel,&nbsp;Roland Weiger,&nbsp;Florin Eggmann","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Large language models (LLMs) such as ChatGPT have potential applications in healthcare, including dentistry. Priming, the practice of providing LLMs with initial, relevant information, is an approach to improve their output quality. This study aimed to evaluate the performance of ChatGPT 3 and ChatGPT 4 on self-assessment questions for dentistry, through the Swiss Federal Licensing Examination in Dental Medicine (SFLEDM), and allergy and clinical immunology, through the European Examination in Allergy and Clinical Immunology (EEAACI). The second objective was to assess the impact of priming on ChatGPT's performance. The SFLEDM and EEAACI multiple-choice questions from the University of Bern's Institute for Medical Education platform were administered to both ChatGPT versions, with and without priming. Performance was analyzed based on correct responses. The statistical analysis included Wilcoxon rank sum tests (α=0.05). The average accuracy rates in the SFLEDM and EEAACI assessments were 63.3% and 79.3%, respectively. Both ChatGPT versions performed better on EEAACI than SFLEDM, with ChatGPT 4 outperforming ChatGPT 3 across all tests. ChatGPT 3's performance exhibited a significant improvement with priming for both EEAACI (p=0.017) and SFLEDM (p=0.024) assessments. For ChatGPT 4, the priming effect was significant only in the SFLEDM assessment (p=0.038). The performance disparity between SFLEDM and EEAACI assessments underscores ChatGPT's varying proficiency across different medical domains, likely tied to the nature and amount of training data available in each field. Priming can be a tool for enhancing output, especially in earlier LLMs. Advancements from ChatGPT 3 to 4 highlight the rapid developments in LLM technology. Yet, their use in critical fields such as healthcare must remain cautious owing to LLMs' inherent limitations and risks.</p>","PeriodicalId":38153,"journal":{"name":"Swiss dental journal","volume":"134 5","pages":"None"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swiss dental journal","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Large language models (LLMs) such as ChatGPT have potential applications in healthcare, including dentistry. Priming, the practice of providing LLMs with initial, relevant information, is an approach to improve their output quality. This study aimed to evaluate the performance of ChatGPT 3 and ChatGPT 4 on self-assessment questions for dentistry, through the Swiss Federal Licensing Examination in Dental Medicine (SFLEDM), and allergy and clinical immunology, through the European Examination in Allergy and Clinical Immunology (EEAACI). The second objective was to assess the impact of priming on ChatGPT's performance. The SFLEDM and EEAACI multiple-choice questions from the University of Bern's Institute for Medical Education platform were administered to both ChatGPT versions, with and without priming. Performance was analyzed based on correct responses. The statistical analysis included Wilcoxon rank sum tests (α=0.05). The average accuracy rates in the SFLEDM and EEAACI assessments were 63.3% and 79.3%, respectively. Both ChatGPT versions performed better on EEAACI than SFLEDM, with ChatGPT 4 outperforming ChatGPT 3 across all tests. ChatGPT 3's performance exhibited a significant improvement with priming for both EEAACI (p=0.017) and SFLEDM (p=0.024) assessments. For ChatGPT 4, the priming effect was significant only in the SFLEDM assessment (p=0.038). The performance disparity between SFLEDM and EEAACI assessments underscores ChatGPT's varying proficiency across different medical domains, likely tied to the nature and amount of training data available in each field. Priming can be a tool for enhancing output, especially in earlier LLMs. Advancements from ChatGPT 3 to 4 highlight the rapid developments in LLM technology. Yet, their use in critical fields such as healthcare must remain cautious owing to LLMs' inherent limitations and risks.

ChatGPT在牙科和过敏免疫学评估中的表现:一项比较研究。
像ChatGPT这样的大型语言模型在包括牙科在内的医疗保健领域有潜在的应用。启动,即向LLM提供初始相关信息的做法,是提高其输出质量的一种方法。本研究旨在通过瑞士联邦牙科医学执照考试(SFLEDM),以及通过欧洲过敏和临床免疫学考试(EEAACI),评估ChatGPT 3和ChatGPT 4在牙科自我评估问题上的表现。第二个目标是评估启动对ChatGPT性能的影响。伯尔尼大学医学教育研究所平台的SFLEDM和EEAACI多项选择题被用于两个ChatGPT版本,无论是否启动。根据正确的回答对性能进行了分析。统计分析包括Wilcoxon秩和检验(α=0.05)。SFLEDM和EEAACI评估的平均准确率分别为63.3%和79.3%。两个ChatGPT版本在EEAACI上的表现都比SFLEDM好,在所有测试中,ChatGPT 4的表现都优于ChatGPT 3。对于EEAACI(p=0.017)和SFLEDM(p=0.024)评估,ChatGPT 3的性能在启动后表现出显著改善。对于ChatGPT 4,启动效应仅在SFLEDM评估中显著(p=0.038)。SFLEDM和EEAACI评估之间的表现差异突显了ChatGPT在不同医学领域的不同熟练程度,这可能与每个领域可用的训练数据的性质和数量有关。启动可以是一种提高输出的工具,尤其是在早期的LLM中。从ChatGPT 3到4的进步突出了LLM技术的快速发展。然而,由于LLM固有的局限性和风险,它们在医疗保健等关键领域的使用必须保持谨慎。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Swiss dental journal
Swiss dental journal Dentistry-Dentistry (miscellaneous)
CiteScore
1.00
自引率
0.00%
发文量
0
期刊介绍: Fondé en 1891 et lu par tous les médecins-dentistes ou presque qui exercent en Suisse, le SWISS DENTAL JOURNAL SSO est l’organe de publication scientifique de la Société suisse des médecins-dentistes SSO. Il publie des articles qui sont reconnus pour la formation continue et informe sur l’actualité en médecine dentaire et dans le domaine de la politique professionnelle de la SSO.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信