人工智能聊天机器人在全国牙科执业资格考试中的表现

IF 3.1 3区 医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE
Chad Chan-Chia Lin , Jui-Sheng Sun , Chin-Hao Chang , Yu-Han Chang , Jenny Zwei-Chieng Chang
{"title":"人工智能聊天机器人在全国牙科执业资格考试中的表现","authors":"Chad Chan-Chia Lin ,&nbsp;Jui-Sheng Sun ,&nbsp;Chin-Hao Chang ,&nbsp;Yu-Han Chang ,&nbsp;Jenny Zwei-Chieng Chang","doi":"10.1016/j.jds.2025.05.012","DOIUrl":null,"url":null,"abstract":"<div><h3>Background/purpose</h3><div>The Taiwan dental board exams comprehensively assess dental candidates across twenty distinct subjects, spanning foundational knowledge to clinical fields, using multiple-choice single-answer exams with a minimum passing score of 60 %. This study assesses the performance of artificial intelligence (AI)-powered chatbots (specifically ChatGPT3.5, Gemini, and Claude2), categorized as Large Language Models (LLMs), on these exams from 2021 to 2023.</div></div><div><h3>Materials and methods</h3><div>A total of 2699 multiple-choice questions spanning eight subjects in basic dentistry and twelve in clinical dentistry were analyzed. Questions involving images and tables were excluded. Statistical analyses were conducted using McNemar's test. Furthermore, annual results of LLMs were compared with the qualification rates of human candidates to provide additional context.</div></div><div><h3>Results</h3><div>Claude2 demonstrated the highest overall accuracy (54.89 %) on the Taiwan national dental licensing examinations, outperforming ChatGPT3.5 (49.33 %) and Gemini (44.63 %), with statistically significant differences in performance across models. In the basic dentistry domain, Claude2 scored 59.73 %, followed by ChatGPT3.5 (54.87 %) and Gemini (47.35 %). Notably, Claude2 excelled in biochemistry (73.81 %) and oral microbiology (88.89 %), while ChatGPT3.5 also performed strongly in oral microbiology (80.56 %). In the clinical dentistry domain, Claude2 led with a score of 52.45 %, surpassing ChatGPT3.5 (46.54 %) and Gemini (43.26 %), and showed strong results in dental public health (65.81 %). Despite these achievements, none of the LLMs attained passing scores overall.</div></div><div><h3>Conclusion</h3><div>None of the models achieved passing scores, highlighting their strengths in foundational knowledge but limitations in clinical reasoning.</div></div>","PeriodicalId":15583,"journal":{"name":"Journal of Dental Sciences","volume":"20 4","pages":"Pages 2307-2314"},"PeriodicalIF":3.1000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance of artificial intelligence chatbots in National dental licensing examination\",\"authors\":\"Chad Chan-Chia Lin ,&nbsp;Jui-Sheng Sun ,&nbsp;Chin-Hao Chang ,&nbsp;Yu-Han Chang ,&nbsp;Jenny Zwei-Chieng Chang\",\"doi\":\"10.1016/j.jds.2025.05.012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background/purpose</h3><div>The Taiwan dental board exams comprehensively assess dental candidates across twenty distinct subjects, spanning foundational knowledge to clinical fields, using multiple-choice single-answer exams with a minimum passing score of 60 %. This study assesses the performance of artificial intelligence (AI)-powered chatbots (specifically ChatGPT3.5, Gemini, and Claude2), categorized as Large Language Models (LLMs), on these exams from 2021 to 2023.</div></div><div><h3>Materials and methods</h3><div>A total of 2699 multiple-choice questions spanning eight subjects in basic dentistry and twelve in clinical dentistry were analyzed. Questions involving images and tables were excluded. Statistical analyses were conducted using McNemar's test. Furthermore, annual results of LLMs were compared with the qualification rates of human candidates to provide additional context.</div></div><div><h3>Results</h3><div>Claude2 demonstrated the highest overall accuracy (54.89 %) on the Taiwan national dental licensing examinations, outperforming ChatGPT3.5 (49.33 %) and Gemini (44.63 %), with statistically significant differences in performance across models. In the basic dentistry domain, Claude2 scored 59.73 %, followed by ChatGPT3.5 (54.87 %) and Gemini (47.35 %). Notably, Claude2 excelled in biochemistry (73.81 %) and oral microbiology (88.89 %), while ChatGPT3.5 also performed strongly in oral microbiology (80.56 %). In the clinical dentistry domain, Claude2 led with a score of 52.45 %, surpassing ChatGPT3.5 (46.54 %) and Gemini (43.26 %), and showed strong results in dental public health (65.81 %). Despite these achievements, none of the LLMs attained passing scores overall.</div></div><div><h3>Conclusion</h3><div>None of the models achieved passing scores, highlighting their strengths in foundational knowledge but limitations in clinical reasoning.</div></div>\",\"PeriodicalId\":15583,\"journal\":{\"name\":\"Journal of Dental Sciences\",\"volume\":\"20 4\",\"pages\":\"Pages 2307-2314\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Dental Sciences\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1991790225001606\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Dental Sciences","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1991790225001606","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

摘要

背景/目的台湾牙科委员会考试采用多项选择单题考试,综合评估牙科考生20个不同的科目,涵盖基础知识到临床领域,最低通过率为60%。本研究评估了人工智能(AI)驱动的聊天机器人(特别是ChatGPT3.5、Gemini和Claude2)在2021年至2023年的这些考试中的表现,这些机器人被归类为大型语言模型(llm)。材料与方法对基础牙医学8个学科和临床牙医学12个学科共2699道选择题进行分析。涉及图像和表格的问题被排除在外。采用McNemar检验进行统计分析。此外,法学硕士的年度结果与人类候选人的合格率进行了比较,以提供额外的背景。结果claude2的总体准确率最高(54.5 (49.33%)),Gemini的总体准确率最高(44.63%),各模型的表现差异有统计学意义。在牙科基础领域,Claude2得分为59.73%,其次是ChatGPT3.5(54.87%)和Gemini(47.35%)。值得注意的是,Claude2在生物化学(73.81%)和口腔微生物学(88.89%)方面表现优异,ChatGPT3.5在口腔微生物学方面也表现优异(80.56%)。在临床牙科领域,Claude2以52.45%的得分领先于ChatGPT3.5(46.54%)和Gemini(43.26%),在牙科公共卫生领域表现出较强的效果(65.81%)。尽管取得了这些成绩,但没有一个法学硕士的总体成绩达到及格。结论所有模型均未达到合格分数,突出了其在基础知识方面的优势,但在临床推理方面存在局限性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Performance of artificial intelligence chatbots in National dental licensing examination

Background/purpose

The Taiwan dental board exams comprehensively assess dental candidates across twenty distinct subjects, spanning foundational knowledge to clinical fields, using multiple-choice single-answer exams with a minimum passing score of 60 %. This study assesses the performance of artificial intelligence (AI)-powered chatbots (specifically ChatGPT3.5, Gemini, and Claude2), categorized as Large Language Models (LLMs), on these exams from 2021 to 2023.

Materials and methods

A total of 2699 multiple-choice questions spanning eight subjects in basic dentistry and twelve in clinical dentistry were analyzed. Questions involving images and tables were excluded. Statistical analyses were conducted using McNemar's test. Furthermore, annual results of LLMs were compared with the qualification rates of human candidates to provide additional context.

Results

Claude2 demonstrated the highest overall accuracy (54.89 %) on the Taiwan national dental licensing examinations, outperforming ChatGPT3.5 (49.33 %) and Gemini (44.63 %), with statistically significant differences in performance across models. In the basic dentistry domain, Claude2 scored 59.73 %, followed by ChatGPT3.5 (54.87 %) and Gemini (47.35 %). Notably, Claude2 excelled in biochemistry (73.81 %) and oral microbiology (88.89 %), while ChatGPT3.5 also performed strongly in oral microbiology (80.56 %). In the clinical dentistry domain, Claude2 led with a score of 52.45 %, surpassing ChatGPT3.5 (46.54 %) and Gemini (43.26 %), and showed strong results in dental public health (65.81 %). Despite these achievements, none of the LLMs attained passing scores overall.

Conclusion

None of the models achieved passing scores, highlighting their strengths in foundational knowledge but limitations in clinical reasoning.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Dental Sciences
Journal of Dental Sciences 医学-牙科与口腔外科
CiteScore
5.10
自引率
14.30%
发文量
348
审稿时长
6 days
期刊介绍: he Journal of Dental Sciences (JDS), published quarterly, is the official and open access publication of the Association for Dental Sciences of the Republic of China (ADS-ROC). The precedent journal of the JDS is the Chinese Dental Journal (CDJ) which had already been covered by MEDLINE in 1988. As the CDJ continued to prove its importance in the region, the ADS-ROC decided to move to the international community by publishing an English journal. Hence, the birth of the JDS in 2006. The JDS is indexed in the SCI Expanded since 2008. It is also indexed in Scopus, and EMCare, ScienceDirect, SIIC Data Bases. The topics covered by the JDS include all fields of basic and clinical dentistry. Some manuscripts focusing on the study of certain endemic diseases such as dental caries and periodontal diseases in particular regions of any country as well as oral pre-cancers, oral cancers, and oral submucous fibrosis related to betel nut chewing habit are also considered for publication. Besides, the JDS also publishes articles about the efficacy of a new treatment modality on oral verrucous hyperplasia or early oral squamous cell carcinoma.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信