Performance of Artificial Intelligence-Based Chatbots (ChatGPT-3.5 and ChatGPT-4.0) Answering the International Team of Implantology Exam Questions.

IF 3.2 3区医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE

Journal of Esthetic and Restorative Dentistry Pub Date : 2025-06-11 DOI:10.1111/jerd.13496

Maurice Salem, Duygu Karasan, Marta Revilla-León, Abdul B Barmak, Irena Sailer

{"title":"Performance of Artificial Intelligence-Based Chatbots (ChatGPT-3.5 and ChatGPT-4.0) Answering the International Team of Implantology Exam Questions.","authors":"Maurice Salem, Duygu Karasan, Marta Revilla-León, Abdul B Barmak, Irena Sailer","doi":"10.1111/jerd.13496","DOIUrl":null,"url":null,"abstract":"Aim: This study aims to compare the performance of licensed dentists and two versions of ChatGPT (v.3.5 and v.4.0) in answering the International Team for Implantology (ITI) certification exam questions in implant dentistry.Materials and methods: The study involved 93 licensed dentists and the two chatbot versions answering 48 text-only multiple-choice questions from the ITI implant certification exam. The 48 questions passed through ChatGPT-3.5 and ChatGPT-4 93 times, and then the data were collected on an Excel sheet (Excel version 2024, Microsoft). Pearson correlation matrix was used to analyze the linear relationship among the tested groups. Additionally, inter- and intraoperator reliability was analyzed using Cronbach's alpha coefficient. One-way Welch's ANOVA and Tukey post-hoc tests were used to determine any significant differences among the groups tested on the exam scores obtained.Results: Licensed dentists obtained a higher score on the test compared to ChatGPT-3.5, while ChatGPT-4.0 and licensed dentists performed similarly. ChatGPT 4.0 resulted in significantly higher scores than ChatGPT-3.5. All groups were able to obtain scores high enough to pass the exam.Conclusion: Both ChatGPT-3.5 and ChatGPT-4.0 are powerful tools that can assist and guide dental licensed dentists and patients. ChatGPT-4.0 showed better results than ChatGPT-3.5; however, more studies should be conducted including new chatbots that are more sophisticated, with the ability to interpret videos and images-chatbots that were not available when this study was performed.","PeriodicalId":15988,"journal":{"name":"Journal of Esthetic and Restorative Dentistry","volume":" ","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Esthetic and Restorative Dentistry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/jerd.13496","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Aim: This study aims to compare the performance of licensed dentists and two versions of ChatGPT (v.3.5 and v.4.0) in answering the International Team for Implantology (ITI) certification exam questions in implant dentistry.

Materials and methods: The study involved 93 licensed dentists and the two chatbot versions answering 48 text-only multiple-choice questions from the ITI implant certification exam. The 48 questions passed through ChatGPT-3.5 and ChatGPT-4 93 times, and then the data were collected on an Excel sheet (Excel version 2024, Microsoft). Pearson correlation matrix was used to analyze the linear relationship among the tested groups. Additionally, inter- and intraoperator reliability was analyzed using Cronbach's alpha coefficient. One-way Welch's ANOVA and Tukey post-hoc tests were used to determine any significant differences among the groups tested on the exam scores obtained.

Results: Licensed dentists obtained a higher score on the test compared to ChatGPT-3.5, while ChatGPT-4.0 and licensed dentists performed similarly. ChatGPT 4.0 resulted in significantly higher scores than ChatGPT-3.5. All groups were able to obtain scores high enough to pass the exam.

Conclusion: Both ChatGPT-3.5 and ChatGPT-4.0 are powerful tools that can assist and guide dental licensed dentists and patients. ChatGPT-4.0 showed better results than ChatGPT-3.5; however, more studies should be conducted including new chatbots that are more sophisticated, with the ability to interpret videos and images-chatbots that were not available when this study was performed.

查看原文本刊更多论文

基于人工智能的聊天机器人（ChatGPT-3.5和ChatGPT-4.0）回答国际种植学考题的性能

目的：本研究旨在比较持牌牙医和两个版本的ChatGPT （v.3.5和v.4.0）在回答国际种植团队（ITI）种植牙科认证考试中的表现。材料和方法：研究涉及93名有执照的牙医和两个聊天机器人版本，回答来自ITI种植体认证考试的48个纯文本选择题。48个问题分别通过ChatGPT-3.5和ChatGPT-4 93次，然后在Excel表格（Excel version 2024, Microsoft）上收集数据。采用Pearson相关矩阵分析各组间的线性关系。此外，使用Cronbach's alpha系数分析了操作者之间和操作者内部的可靠性。采用单向Welch’s ANOVA和Tukey事后检验来确定各组在考试成绩上的显著差异。结果：与ChatGPT-3.5相比，持牌牙医在测试中的得分更高，而ChatGPT-4.0和持牌牙医的表现相似。ChatGPT 4.0的得分明显高于ChatGPT 3.5。所有的小组都能获得足够高的分数来通过考试。结论：ChatGPT-3.5和ChatGPT-4.0都是帮助和指导牙科执业牙医和患者的强大工具。ChatGPT-4.0效果优于ChatGPT-3.5；然而，应该进行更多的研究，包括更复杂的新型聊天机器人，能够解释视频和图像——在进行这项研究时，这些聊天机器人是不可用的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Esthetic and Restorative Dentistry 医学-牙科与口腔外科

CiteScore

6.30

自引率

6.20%

发文量

124

审稿时长

>12 weeks

期刊介绍： The Journal of Esthetic and Restorative Dentistry (JERD) is the longest standing peer-reviewed journal devoted solely to advancing the knowledge and practice of esthetic dentistry. Its goal is to provide the very latest evidence-based information in the realm of contemporary interdisciplinary esthetic dentistry through high quality clinical papers, sound research reports and educational features. The range of topics covered in the journal includes: - Interdisciplinary esthetic concepts - Implants - Conservative adhesive restorations - Tooth Whitening - Prosthodontic materials and techniques - Dental materials - Orthodontic, periodontal and endodontic esthetics - Esthetics related research - Innovations in esthetics