Performance of ChatGPT-3.5 and ChatGPT-4o in the Japanese National Dental Examination

IF 1.6 4区医学 Q3 DENTISTRY, ORAL SURGERY & MEDICINE

Journal of Dental Education Pub Date : 2024-11-13 DOI:10.1002/jdd.13766

Osamu Uehara DDS, PhD, Tetsuro Morikawa DDS, PhD, Fumiya Harada DDS, PhD, Nodoka Sugiyama DDS, PhD, Yuko Matsuki DDS, PhD, Daichi Hiraki DDS, PhD, Hinako Sakurai DDS, Takashi Kado DDS, PhD, Koki Yoshida DDS, PhD, Yukie Murata DDS, PhD, Hirofumi Matsuoka PhD, Toshiyuki Nagasawa DDS, PhD, Yasushi Furuichi DDS, PhD, Yoshihiro Abiko DDS, PhD, Hiroko Miura DDS, PhD

{"title":"Performance of ChatGPT-3.5 and ChatGPT-4o in the Japanese National Dental Examination","authors":"Osamu Uehara DDS, PhD, Tetsuro Morikawa DDS, PhD, Fumiya Harada DDS, PhD, Nodoka Sugiyama DDS, PhD, Yuko Matsuki DDS, PhD, Daichi Hiraki DDS, PhD, Hinako Sakurai DDS, Takashi Kado DDS, PhD, Koki Yoshida DDS, PhD, Yukie Murata DDS, PhD, Hirofumi Matsuoka PhD, Toshiyuki Nagasawa DDS, PhD, Yasushi Furuichi DDS, PhD, Yoshihiro Abiko DDS, PhD, Hiroko Miura DDS, PhD","doi":"10.1002/jdd.13766","DOIUrl":null,"url":null,"abstract":"Objectives: In this study, we compared the performance of ChatGPT-3.5 to that of ChatGPT-4o in the context of the Japanese National Dental Examination, which assesses clinical reasoning skills and dental knowledge, to determine their potential usefulness in dental education.Methods: ChatGPT's performance was assessed using 1399 (55% of the exam) of 2520 questions from the Japanese National Dental Examinations (111−117). The 1121 excluded questions (45% of the exam) contained figures or tables that ChatGPT could not recognize. The questions were categorized into 18 different subjects based on dental specialty. Statistical analysis was performed using SPSS software, with McNemar's test applied to assess differences in performance.Results: A significant improvement was noted in the percentage of correct answers from ChatGPT-4o (84.63%) compared with those from ChatGPT-3.5 (45.46%), demonstrating enhanced reliability and subject knowledge. ChatGPT-4o consistently outperformed ChatGPT-3.5 across all dental subjects, with significant improvements in subjects such as oral surgery, pathology, pharmacology, and microbiology. Heatmap analysis revealed that ChatGPT-4o provided more stable and higher correct answer rates, especially for complex subjects.Conclusions: This study found that advanced natural language processing models, such as ChatGPT-4o, potentially have sufficiently advanced clinical reasoning skills and dental knowledge to function as a supplementary tool in dental education and exam preparation.","PeriodicalId":50216,"journal":{"name":"Journal of Dental Education","volume":"89 4","pages":"459-466"},"PeriodicalIF":1.6000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Dental Education","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jdd.13766","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: In this study, we compared the performance of ChatGPT-3.5 to that of ChatGPT-4o in the context of the Japanese National Dental Examination, which assesses clinical reasoning skills and dental knowledge, to determine their potential usefulness in dental education.

Methods: ChatGPT's performance was assessed using 1399 (55% of the exam) of 2520 questions from the Japanese National Dental Examinations (111−117). The 1121 excluded questions (45% of the exam) contained figures or tables that ChatGPT could not recognize. The questions were categorized into 18 different subjects based on dental specialty. Statistical analysis was performed using SPSS software, with McNemar's test applied to assess differences in performance.

Results: A significant improvement was noted in the percentage of correct answers from ChatGPT-4o (84.63%) compared with those from ChatGPT-3.5 (45.46%), demonstrating enhanced reliability and subject knowledge. ChatGPT-4o consistently outperformed ChatGPT-3.5 across all dental subjects, with significant improvements in subjects such as oral surgery, pathology, pharmacology, and microbiology. Heatmap analysis revealed that ChatGPT-4o provided more stable and higher correct answer rates, especially for complex subjects.

Conclusions: This study found that advanced natural language processing models, such as ChatGPT-4o, potentially have sufficiently advanced clinical reasoning skills and dental knowledge to function as a supplementary tool in dental education and exam preparation.

查看原文本刊更多论文

ChatGPT-3.5 和 ChatGPT-4o 在日本全国牙科考试中的表现。

研究目的本研究比较了 ChatGPT-3.5 和 ChatGPT-4o 在日本全国牙科考试（评估临床推理能力和牙科知识）中的表现，以确定它们在牙科教育中的潜在作用：在日本全国牙科考试（111-117）的 2520 道试题中，使用 1399 道试题（占考试的 55%）对 ChatGPT 的性能进行了评估。被排除在外的 1121 道试题（占考试的 45%）包含 ChatGPT 无法识别的数字或表格。这些试题根据牙科专业分为 18 个不同的科目。使用 SPSS 软件进行统计分析，并用 McNemar 检验来评估成绩差异：结果：与 ChatGPT-3.5 的正确率（45.46%）相比，ChatGPT-4o 的正确率（84.63%）有了明显提高，这表明其可靠性和学科知识得到了增强。在所有牙科科目中，ChatGPT-4o 的表现始终优于 ChatGPT-3.5，在口腔外科、病理学、药理学和微生物学等科目中都有显著提高。热图分析显示，ChatGPT-4o 提供了更稳定和更高的正确答案率，尤其是在复杂的科目上：本研究发现，ChatGPT-4o 等先进的自然语言处理模型具有足够先进的临床推理技能和牙科知识，可以作为牙科教育和考试准备的辅助工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Dental Education 医学-牙科与口腔外科

CiteScore

3.50

自引率

21.70%

发文量

274

审稿时长

3-8 weeks

期刊介绍： The Journal of Dental Education (JDE) is a peer-reviewed monthly journal that publishes a wide variety of educational and scientific research in dental, allied dental and advanced dental education. Published continuously by the American Dental Education Association since 1936 and internationally recognized as the premier journal for academic dentistry, the JDE publishes articles on such topics as curriculum reform, education research methods, innovative educational and assessment methodologies, faculty development, community-based dental education, student recruitment and admissions, professional and educational ethics, dental education around the world and systematic reviews of educational interest. The JDE is one of the top scholarly journals publishing the most important work in oral health education today; it celebrated its 80th anniversary in 2016.