Artificial intelligence in dental education: ChatGPT's performance on the periodontic in-service examination

IF 4.2 2区医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE

Journal of periodontology Pub Date : 2024-01-10 DOI:10.1002/JPER.23-0514

Arman Danesh, Hirad Pazouki, Farzad Danesh, Arsalan Danesh, Saynur Vardar-Sengul

{"title":"Artificial intelligence in dental education: ChatGPT's performance on the periodontic in-service examination","authors":"Arman Danesh, Hirad Pazouki, Farzad Danesh, Arsalan Danesh, Saynur Vardar-Sengul","doi":"10.1002/JPER.23-0514","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>ChatGPT's (Chat Generative Pre-Trained Transformer) remarkable capacity to generate human-like output makes it an appealing learning tool for healthcare students worldwide. Nevertheless, the chatbot's responses may be subject to inaccuracies, putting forth an intense risk of misinformation. ChatGPT's capabilities should be examined in every corner of healthcare education, including dentistry and its specialties, to understand the potential of misinformation associated with the chatbot's use as a learning tool. Our investigation aims to explore ChatGPT's foundation of knowledge in the field of periodontology by evaluating the chatbot's performance on questions obtained from an in-service examination administered by the American Academy of Periodontology (AAP).</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>ChatGPT3.5 and ChatGPT4 were evaluated on 311 multiple-choice questions obtained from the 2023 in-service examination administered by the AAP. The dataset of in-service examination questions was accessed through Nova Southeastern University's Department of Periodontology. Our study excluded questions containing an image as ChatGPT does not accept image inputs.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>ChatGPT3.5 and ChatGPT4 answered 57.9% and 73.6% of in-service questions correctly on the 2023 Periodontics In-Service Written Examination, respectively. A two-tailed <i>t</i> test was incorporated to compare independent sample means, and sample proportions were compared using a two-tailed χ<sup>2</sup> test. A <i>p</i> value below the threshold of 0.05 was deemed statistically significant.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>While ChatGPT4 showed a higher proficiency compared to ChatGPT3.5, both chatbot models leave considerable room for misinformation with their responses relating to periodontology. The findings of the study encourage residents to scrutinize the periodontic information generated by ChatGPT to account for the chatbot's current limitations.</p>\n </section>\n </div>","PeriodicalId":16716,"journal":{"name":"Journal of periodontology","volume":"95 7","pages":"682-687"},"PeriodicalIF":4.2000,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of periodontology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/JPER.23-0514","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Background

ChatGPT's (Chat Generative Pre-Trained Transformer) remarkable capacity to generate human-like output makes it an appealing learning tool for healthcare students worldwide. Nevertheless, the chatbot's responses may be subject to inaccuracies, putting forth an intense risk of misinformation. ChatGPT's capabilities should be examined in every corner of healthcare education, including dentistry and its specialties, to understand the potential of misinformation associated with the chatbot's use as a learning tool. Our investigation aims to explore ChatGPT's foundation of knowledge in the field of periodontology by evaluating the chatbot's performance on questions obtained from an in-service examination administered by the American Academy of Periodontology (AAP).

Methods

ChatGPT3.5 and ChatGPT4 were evaluated on 311 multiple-choice questions obtained from the 2023 in-service examination administered by the AAP. The dataset of in-service examination questions was accessed through Nova Southeastern University's Department of Periodontology. Our study excluded questions containing an image as ChatGPT does not accept image inputs.

Results

ChatGPT3.5 and ChatGPT4 answered 57.9% and 73.6% of in-service questions correctly on the 2023 Periodontics In-Service Written Examination, respectively. A two-tailed t test was incorporated to compare independent sample means, and sample proportions were compared using a two-tailed χ² test. A p value below the threshold of 0.05 was deemed statistically significant.

Conclusion

While ChatGPT4 showed a higher proficiency compared to ChatGPT3.5, both chatbot models leave considerable room for misinformation with their responses relating to periodontology. The findings of the study encourage residents to scrutinize the periodontic information generated by ChatGPT to account for the chatbot's current limitations.

查看原文本刊更多论文

人工智能在牙科教育中的应用：ChatGPT 在牙周病学在职考试中的表现。

背景：聊天生成预训练转换器（ChatGPT）能够生成类似人类的输出结果，其卓越的能力使其成为全球医疗保健专业学生青睐的学习工具。然而，聊天机器人的回复可能会有不准确之处，从而带来巨大的错误信息风险。应该在医疗保健教育的各个领域（包括口腔医学及其专业）检查 ChatGPT 的功能，以了解聊天机器人作为学习工具可能带来的误导。我们的调查旨在通过评估聊天机器人在美国牙周病学会（AAP）举办的在职考试中回答问题的表现，探索 ChatGPT 在牙周病学领域的知识基础：方法：对 ChatGPT3.5 和 ChatGPT4 在美国牙周病学会 2023 年在职考试中的 311 道选择题上的表现进行了评估。在职考试试题数据集是通过诺瓦东南大学牙周病学系获取的。由于 ChatGPT 不接受图片输入，因此我们的研究排除了包含图片的问题：结果：ChatGPT3.5 和 ChatGPT4 分别正确回答了 2023 年牙周病学在职笔试中 57.9% 和 73.6% 的在职问题。采用双尾 t 检验比较独立样本均值，采用双尾 χ2 检验比较样本比例。P 值低于 0.05 的临界值被视为具有统计学意义：虽然 ChatGPT4 与 ChatGPT3.5 相比显示出更高的熟练度，但这两种聊天机器人模型在牙周病学相关的回复中都留下了相当大的误导空间。研究结果鼓励住院医师仔细检查 ChatGPT 生成的牙周信息，以考虑聊天机器人目前的局限性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊