ChatGPT 在牙科和过敏免疫学评估中的表现:一项比较研究。

Q3 Medicine
Alexander Fuchs, Tina Trachsel, Roland Weiger, Florin Eggmann
{"title":"ChatGPT 在牙科和过敏免疫学评估中的表现:一项比较研究。","authors":"Alexander Fuchs, Tina Trachsel, Roland Weiger, Florin Eggmann","doi":"10.61872/sdj-2024-06-01","DOIUrl":null,"url":null,"abstract":"<p><p>Large language models (LLMs) such as ChatGPT have potential applications in healthcare, including dentistry. Priming, the practice of providing LLMs with initial, relevant information, is an approach to improve their output quality. This study aimed to evaluate the performance of ChatGPT 3 and ChatGPT 4 on self-assessment questions for dentistry, through the Swiss Federal Licensing Examination in Dental Medicine (SFLEDM), and allergy and clinical immunology, through the European Examination in Allergy and Clinical Immunology (EEAACI). The second objective was to assess the impact of priming on ChatGPT's performance. The SFLEDM and EEAACI multiple-choice questions from the University of Bern's Institute for Medical Education platform were administered to both ChatGPT versions, with and without priming. Performance was analyzed based on correct responses. The statistical analysis included Wilcoxon rank sum tests (alpha=0.05). The average accuracy rates in the SFLEDM and EEAACI assessments were 63.3% and 79.3%, respectively. Both ChatGPT versions performed better on EEAACI than SFLEDM, with ChatGPT 4 outperforming ChatGPT 3 across all tests. ChatGPT 3's performance exhibited a significant improvement with priming for both EEAACI (p=0.017) and SFLEDM (p=0.024) assessments. For ChatGPT 4, the priming effect was significant only in the SFLEDM assessment (p=0.038). The performance disparity between SFLEDM and EEAACI assessments underscores ChatGPT's varying proficiency across different medical domains, likely tied to the nature and amount of training data available in each field. Priming can be a tool for enhancing output, especially in earlier LLMs. Advancements from ChatGPT 3 to 4 highlight the rapid developments in LLM technology. Yet, their use in critical fields such as healthcare must remain cautious owing to LLMs' inherent limitations and risks.</p>","PeriodicalId":38153,"journal":{"name":"Swiss dental journal","volume":"134 2","pages":"1-17"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ChatGPT's performance in dentistry and allergyimmunology assessments: a comparative study.\",\"authors\":\"Alexander Fuchs, Tina Trachsel, Roland Weiger, Florin Eggmann\",\"doi\":\"10.61872/sdj-2024-06-01\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Large language models (LLMs) such as ChatGPT have potential applications in healthcare, including dentistry. Priming, the practice of providing LLMs with initial, relevant information, is an approach to improve their output quality. This study aimed to evaluate the performance of ChatGPT 3 and ChatGPT 4 on self-assessment questions for dentistry, through the Swiss Federal Licensing Examination in Dental Medicine (SFLEDM), and allergy and clinical immunology, through the European Examination in Allergy and Clinical Immunology (EEAACI). The second objective was to assess the impact of priming on ChatGPT's performance. The SFLEDM and EEAACI multiple-choice questions from the University of Bern's Institute for Medical Education platform were administered to both ChatGPT versions, with and without priming. Performance was analyzed based on correct responses. The statistical analysis included Wilcoxon rank sum tests (alpha=0.05). The average accuracy rates in the SFLEDM and EEAACI assessments were 63.3% and 79.3%, respectively. Both ChatGPT versions performed better on EEAACI than SFLEDM, with ChatGPT 4 outperforming ChatGPT 3 across all tests. ChatGPT 3's performance exhibited a significant improvement with priming for both EEAACI (p=0.017) and SFLEDM (p=0.024) assessments. For ChatGPT 4, the priming effect was significant only in the SFLEDM assessment (p=0.038). The performance disparity between SFLEDM and EEAACI assessments underscores ChatGPT's varying proficiency across different medical domains, likely tied to the nature and amount of training data available in each field. Priming can be a tool for enhancing output, especially in earlier LLMs. Advancements from ChatGPT 3 to 4 highlight the rapid developments in LLM technology. Yet, their use in critical fields such as healthcare must remain cautious owing to LLMs' inherent limitations and risks.</p>\",\"PeriodicalId\":38153,\"journal\":{\"name\":\"Swiss dental journal\",\"volume\":\"134 2\",\"pages\":\"1-17\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Swiss dental journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.61872/sdj-2024-06-01\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swiss dental journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.61872/sdj-2024-06-01","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

摘要

大型语言模型(LLMs),如 ChatGPT,在包括牙科在内的医疗保健领域具有潜在的应用价值。向 LLM 提供初始相关信息的做法是提高其输出质量的一种方法。本研究旨在评估 ChatGPT 3 和 ChatGPT 4 在瑞士联邦牙科医学执业资格考试(SFLEDM)牙科自测题和欧洲过敏与临床免疫学考试(EEAACI)过敏与临床免疫学自测题上的表现。第二个目标是评估引物对 ChatGPT 成绩的影响。伯尔尼大学医学教育研究所平台上的 SFLEDM 和 EEAACI 选择题分别在 ChatGPT 的两个版本上进行了测试,包括有引导和无引导两种情况。成绩根据正确答案进行分析。统计分析包括 Wilcoxon 秩和检验(α=0.05)。SFLEDM 和 EEAACI 评估的平均正确率分别为 63.3% 和 79.3%。两个 ChatGPT 版本在 EEAACI 中的表现均优于 SFLEDM,其中 ChatGPT 4 在所有测试中均优于 ChatGPT 3。在 EEAACI(p=0.017)和 SFLEDM(p=0.024)评估中,ChatGPT 3 的性能在引物的作用下均有显著提高。对于 ChatGPT 4,引物效应仅在 SFLEDM 评估中显著(p=0.038)。SFLEDM 和 EEAACI 评估之间的性能差异凸显了 ChatGPT 在不同医学领域的不同熟练程度,这可能与每个领域可用培训数据的性质和数量有关。引导可以是提高产出的一种工具,尤其是在早期的 LLM 中。ChatGPT 3 到 4 的进步凸显了 LLM 技术的飞速发展。然而,由于 LLM 本身的局限性和风险,在医疗保健等关键领域使用 LLM 仍需谨慎。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
ChatGPT's performance in dentistry and allergyimmunology assessments: a comparative study.

Large language models (LLMs) such as ChatGPT have potential applications in healthcare, including dentistry. Priming, the practice of providing LLMs with initial, relevant information, is an approach to improve their output quality. This study aimed to evaluate the performance of ChatGPT 3 and ChatGPT 4 on self-assessment questions for dentistry, through the Swiss Federal Licensing Examination in Dental Medicine (SFLEDM), and allergy and clinical immunology, through the European Examination in Allergy and Clinical Immunology (EEAACI). The second objective was to assess the impact of priming on ChatGPT's performance. The SFLEDM and EEAACI multiple-choice questions from the University of Bern's Institute for Medical Education platform were administered to both ChatGPT versions, with and without priming. Performance was analyzed based on correct responses. The statistical analysis included Wilcoxon rank sum tests (alpha=0.05). The average accuracy rates in the SFLEDM and EEAACI assessments were 63.3% and 79.3%, respectively. Both ChatGPT versions performed better on EEAACI than SFLEDM, with ChatGPT 4 outperforming ChatGPT 3 across all tests. ChatGPT 3's performance exhibited a significant improvement with priming for both EEAACI (p=0.017) and SFLEDM (p=0.024) assessments. For ChatGPT 4, the priming effect was significant only in the SFLEDM assessment (p=0.038). The performance disparity between SFLEDM and EEAACI assessments underscores ChatGPT's varying proficiency across different medical domains, likely tied to the nature and amount of training data available in each field. Priming can be a tool for enhancing output, especially in earlier LLMs. Advancements from ChatGPT 3 to 4 highlight the rapid developments in LLM technology. Yet, their use in critical fields such as healthcare must remain cautious owing to LLMs' inherent limitations and risks.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Swiss dental journal
Swiss dental journal Dentistry-Dentistry (miscellaneous)
CiteScore
1.00
自引率
0.00%
发文量
0
期刊介绍: Fondé en 1891 et lu par tous les médecins-dentistes ou presque qui exercent en Suisse, le SWISS DENTAL JOURNAL SSO est l’organe de publication scientifique de la Société suisse des médecins-dentistes SSO. Il publie des articles qui sont reconnus pour la formation continue et informe sur l’actualité en médecine dentaire et dans le domaine de la politique professionnelle de la SSO.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信