Performance of ChatGPT 3.5 and 4 on U.S. dental examinations: the INBDE, ADAT, and DAT.

IF 2.1 Q3 DENTISTRY, ORAL SURGERY & MEDICINE

Imaging Science in Dentistry Pub Date : 2024-09-01 Epub Date: 2024-07-02 DOI:10.5624/isd.20240037

Mahmood Dashti, Shohreh Ghasemi, Niloofar Ghadimi, Delband Hefzi, Azizeh Karimian, Niusha Zare, Amir Fahimipour, Zohaib Khurshid, Maryam Mohammadalizadeh Chafjiri, Sahar Ghaedsharaf

{"title":"Performance of ChatGPT 3.5 and 4 on U.S. dental examinations: the INBDE, ADAT, and DAT.","authors":"Mahmood Dashti, Shohreh Ghasemi, Niloofar Ghadimi, Delband Hefzi, Azizeh Karimian, Niusha Zare, Amir Fahimipour, Zohaib Khurshid, Maryam Mohammadalizadeh Chafjiri, Sahar Ghaedsharaf","doi":"10.5624/isd.20240037","DOIUrl":null,"url":null,"abstract":"Purpose: Recent advancements in artificial intelligence (AI), particularly tools such as ChatGPT developed by OpenAI, a U.S.-based AI research organization, have transformed the healthcare and education sectors. This study investigated the effectiveness of ChatGPT in answering dentistry exam questions, demonstrating its potential to enhance professional practice and patient care.Materials and methods: This study assessed the performance of ChatGPT 3.5 and 4 on U.S. dental exams - specifically, the Integrated National Board Dental Examination (INBDE), Dental Admission Test (DAT), and Advanced Dental Admission Test (ADAT) - excluding image-based questions. Using customized prompts, ChatGPT's answers were evaluated against official answer sheets.Results: ChatGPT 3.5 and 4 were tested with 253 questions from the INBDE, ADAT, and DAT exams. For the INBDE, both versions achieved 80% accuracy in knowledge-based questions and 66-69% in case history questions. In ADAT, they scored 66-83% in knowledge-based and 76% in case history questions. ChatGPT 4 excelled on the DAT, with 94% accuracy in knowledge-based questions, 57% in mathematical analysis items, and 100% in comprehension questions, surpassing ChatGPT 3.5's rates of 83%, 31%, and 82%, respectively. The difference was significant for knowledge-based questions (P=0.009). Both versions showed similar patterns in incorrect responses.Conclusion: Both ChatGPT 3.5 and 4 effectively handled knowledge-based, case history, and comprehension questions, with ChatGPT 4 being more reliable and surpassing the performance of 3.5. ChatGPT 4's perfect score in comprehension questions underscores its trainability in specific subjects. However, both versions exhibited weaker performance in mathematical analysis, suggesting this as an area for improvement.","PeriodicalId":51714,"journal":{"name":"Imaging Science in Dentistry","volume":"54 3","pages":"271-275"},"PeriodicalIF":2.1000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11450412/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Imaging Science in Dentistry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5624/isd.20240037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/2 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Recent advancements in artificial intelligence (AI), particularly tools such as ChatGPT developed by OpenAI, a U.S.-based AI research organization, have transformed the healthcare and education sectors. This study investigated the effectiveness of ChatGPT in answering dentistry exam questions, demonstrating its potential to enhance professional practice and patient care.

Materials and methods: This study assessed the performance of ChatGPT 3.5 and 4 on U.S. dental exams - specifically, the Integrated National Board Dental Examination (INBDE), Dental Admission Test (DAT), and Advanced Dental Admission Test (ADAT) - excluding image-based questions. Using customized prompts, ChatGPT's answers were evaluated against official answer sheets.

Results: ChatGPT 3.5 and 4 were tested with 253 questions from the INBDE, ADAT, and DAT exams. For the INBDE, both versions achieved 80% accuracy in knowledge-based questions and 66-69% in case history questions. In ADAT, they scored 66-83% in knowledge-based and 76% in case history questions. ChatGPT 4 excelled on the DAT, with 94% accuracy in knowledge-based questions, 57% in mathematical analysis items, and 100% in comprehension questions, surpassing ChatGPT 3.5's rates of 83%, 31%, and 82%, respectively. The difference was significant for knowledge-based questions (P=0.009). Both versions showed similar patterns in incorrect responses.

Conclusion: Both ChatGPT 3.5 and 4 effectively handled knowledge-based, case history, and comprehension questions, with ChatGPT 4 being more reliable and surpassing the performance of 3.5. ChatGPT 4's perfect score in comprehension questions underscores its trainability in specific subjects. However, both versions exhibited weaker performance in mathematical analysis, suggesting this as an area for improvement.

Abstract Image

查看原文本刊更多论文

ChatGPT 3.5 和 4 在美国牙科考试中的表现：INBDE、ADAT 和 DAT。

目的人工智能（AI）的最新进展，尤其是美国人工智能研究机构 OpenAI 开发的 ChatGPT 等工具，已经改变了医疗保健和教育领域。本研究调查了 ChatGPT 在回答牙科考试问题方面的有效性，展示了它在提高专业实践和患者护理方面的潜力：本研究评估了 ChatGPT 3.5 和 4 在美国牙科考试中的表现，特别是综合国家委员会牙科考试 (INBDE)、牙科入学考试 (DAT) 和高级牙科入学考试 (ADAT)，不包括基于图像的问题。利用定制的提示，ChatGPT 的答案与官方答卷进行了对比评估：结果：我们使用 INBDE、ADAT 和 DAT 考试中的 253 道题目对 ChatGPT 3.5 和 4 进行了测试。在 INBDE 考试中，两个版本在知识型问题上的准确率均为 80%，在案例史问题上的准确率为 66-69%。在 ADAT 考试中，知识型问题的正确率为 66-83%，病例史问题的正确率为 76%。ChatGPT 4 在 DAT 考试中表现出色，知识题准确率为 94%，数学分析题准确率为 57%，理解题准确率为 100%，分别超过 ChatGPT 3.5 的 83%、31% 和 82%。在知识性问题上，两者差异显著（P=0.009）。两个版本的错误回答模式相似：结论：ChatGPT 3.5 和 ChatGPT 4 都能有效地处理知识性问题、病例史问题和理解性问题，其中 ChatGPT 4 更可靠，其表现超过了 3.5。ChatGPT 4 在理解题中的满分突出了它在特定科目中的可训练性。不过，两个版本在数学分析方面的表现都较弱，这表明这是一个有待改进的领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Imaging Science in Dentistry DENTISTRY, ORAL SURGERY & MEDICINE-

CiteScore

2.90

自引率

11.10%

发文量