{"title":"Information from digital and human sources: A comparison of chatbot and clinician responses to orthodontic questions.","authors":"Ufuk Metin, Merve Goymen","doi":"10.1016/j.ajodo.2025.04.008","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>This study aimed to investigate whether artificial intelligence (AI)-based chatbots can be used as reliable adjunct tools in orthodontic practice by evaluating chatbot responses and comparing them to those of clinicians with varying levels of knowledge.</p><p><strong>Methods: </strong>Large language model-based chatbots (ChatGPT-4, ChatGPT-4o, Microsoft Copilot, Google Gemini 1.5 Pro, and Claude 3.5 Sonnet) and clinicians (dental students, general dentists, and orthodontists; n = 30) were included. The groups were asked 40 true and false questions, and the accuracy rate for each question was assessed by comparing it to the predetermined answer key. The total score was converted into a percentage. The Kruskal-Wallis test and Dunn's multiple comparison tests were used to compare accuracy rates. The consistency of the answers given by chatbots at 3 different times was assessed by Cronbach α.</p><p><strong>Results: </strong>The accuracy ratio scores for students were significantly lower than Microsoft Copilot (P = 0.029), Claude 3.5 Sonnet (P = 0.023), ChatGPT-4o (P = 0.005), and orthodontists (P = 0.001). For dentists, the accuracy ratio scores were found to be significantly lower than ChatGPT-4o (P = 0.019) and orthodontists (P = 0.001). The accuracy rate of ChatGPT-4o was closest to that of orthodontists, whereas the accuracy rates of ChatGPT-4, Microsoft Copilot, Claude 3.5 Sonnet, and Google Gemini 1.5 Pro were lower than orthodontists but higher than general dentists. Although ChatGPT-4 demonstrated a high degree of consistency in its responses, evidenced by a high Cronbach α value (α = 0.867), ChatGPT-4o (α = 0.256) and Claude 3.5 Sonnet (α = 0.256) were the least consistent chatbots.</p><p><strong>Conclusions: </strong>The study found that orthodontists had the highest accuracy rate, whereas AI-based chatbots had a higher accuracy rate compared with dental students and general dentists. However, ChatGPT-4 gave the most consistent answers, whereas ChatGPT-4o and Claude 3.5 Sonnet showed the least consistency. AI-based chatbots can be useful for patient education and general orthodontic guidance, but a lack of consistency in responses can lead to the risk of misinformation.</p>","PeriodicalId":50806,"journal":{"name":"American Journal of Orthodontics and Dentofacial Orthopedics","volume":" ","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Orthodontics and Dentofacial Orthopedics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.ajodo.2025.04.008","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: This study aimed to investigate whether artificial intelligence (AI)-based chatbots can be used as reliable adjunct tools in orthodontic practice by evaluating chatbot responses and comparing them to those of clinicians with varying levels of knowledge.
Methods: Large language model-based chatbots (ChatGPT-4, ChatGPT-4o, Microsoft Copilot, Google Gemini 1.5 Pro, and Claude 3.5 Sonnet) and clinicians (dental students, general dentists, and orthodontists; n = 30) were included. The groups were asked 40 true and false questions, and the accuracy rate for each question was assessed by comparing it to the predetermined answer key. The total score was converted into a percentage. The Kruskal-Wallis test and Dunn's multiple comparison tests were used to compare accuracy rates. The consistency of the answers given by chatbots at 3 different times was assessed by Cronbach α.
Results: The accuracy ratio scores for students were significantly lower than Microsoft Copilot (P = 0.029), Claude 3.5 Sonnet (P = 0.023), ChatGPT-4o (P = 0.005), and orthodontists (P = 0.001). For dentists, the accuracy ratio scores were found to be significantly lower than ChatGPT-4o (P = 0.019) and orthodontists (P = 0.001). The accuracy rate of ChatGPT-4o was closest to that of orthodontists, whereas the accuracy rates of ChatGPT-4, Microsoft Copilot, Claude 3.5 Sonnet, and Google Gemini 1.5 Pro were lower than orthodontists but higher than general dentists. Although ChatGPT-4 demonstrated a high degree of consistency in its responses, evidenced by a high Cronbach α value (α = 0.867), ChatGPT-4o (α = 0.256) and Claude 3.5 Sonnet (α = 0.256) were the least consistent chatbots.
Conclusions: The study found that orthodontists had the highest accuracy rate, whereas AI-based chatbots had a higher accuracy rate compared with dental students and general dentists. However, ChatGPT-4 gave the most consistent answers, whereas ChatGPT-4o and Claude 3.5 Sonnet showed the least consistency. AI-based chatbots can be useful for patient education and general orthodontic guidance, but a lack of consistency in responses can lead to the risk of misinformation.
期刊介绍:
Published for more than 100 years, the American Journal of Orthodontics and Dentofacial Orthopedics remains the leading orthodontic resource. It is the official publication of the American Association of Orthodontists, its constituent societies, the American Board of Orthodontics, and the College of Diplomates of the American Board of Orthodontics. Each month its readers have access to original peer-reviewed articles that examine all phases of orthodontic treatment. Illustrated throughout, the publication includes tables, color photographs, and statistical data. Coverage includes successful diagnostic procedures, imaging techniques, bracket and archwire materials, extraction and impaction concerns, orthognathic surgery, TMJ disorders, removable appliances, and adult therapy.