A Comparative Analysis of Responses of Artificial Intelligence Chatbots in Special Needs Dentistry.

Pediatric dentistry Pub Date : 2024-09-15

Rata Rokhshad, Mouada Fadul, Guihua Zhai, Kimberly Carr, Janice G Jackson, Ping Zhang

{"title":"A Comparative Analysis of Responses of Artificial Intelligence Chatbots in Special Needs Dentistry.","authors":"Rata Rokhshad, Mouada Fadul, Guihua Zhai, Kimberly Carr, Janice G Jackson, Ping Zhang","doi":"","DOIUrl":null,"url":null,"abstract":"Purpose: To evaluate the accuracy and consistency of chatbots in answering questions related to special needs dentistry. Methods: Nine publicly accessible chatbots, including Google Bard, ChatGPT 4, ChatGPT 3.5, Llama, Sage, Claude 2 100k, Claude-instant, Claude-instant-100k, and Google PaLM, were evaluated on their ability to answer a set of 25 true/false questions related to special needs dentistry and 15 questions for syndrome diagnosis based on their oral manifestations. Each chatbot was asked independently three times at a three-week interval from November to December 2023, and the responses were evaluated by dental professionals. The Wilcoxon exact test was used to compare accuracy rates among the chatbots while Cronbach's alpha was utilized to measure the consistency of the chatbots' responses. Results: Chatbots had an average accuracy of 55??4 percent in answering all questions, 37±6 percent in diagnosis, and 67±8 percent in answering true/false questions. No significant difference (P>0.05) in the accuracy proportion was detected between any pairwise chatbot comparison. All chatbots demonstrated acceptable reliability (Cronbach's alpha greater than 0.7), with Claude instant having the highest reliability of 0.93. Conclusion: Chatbots exhibit acceptable consistency in responding to questions related to special needs dentistry and better accuracy in responding to true/false questions than diagnostic questions. The clinical relevance is not fully established at this stage, but it may become a useful tool in the future.","PeriodicalId":101357,"journal":{"name":"Pediatric dentistry","volume":"46 5","pages":"337-344"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pediatric dentistry","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: To evaluate the accuracy and consistency of chatbots in answering questions related to special needs dentistry. Methods: Nine publicly accessible chatbots, including Google Bard, ChatGPT 4, ChatGPT 3.5, Llama, Sage, Claude 2 100k, Claude-instant, Claude-instant-100k, and Google PaLM, were evaluated on their ability to answer a set of 25 true/false questions related to special needs dentistry and 15 questions for syndrome diagnosis based on their oral manifestations. Each chatbot was asked independently three times at a three-week interval from November to December 2023, and the responses were evaluated by dental professionals. The Wilcoxon exact test was used to compare accuracy rates among the chatbots while Cronbach's alpha was utilized to measure the consistency of the chatbots' responses. Results: Chatbots had an average accuracy of 55??4 percent in answering all questions, 37±6 percent in diagnosis, and 67±8 percent in answering true/false questions. No significant difference (P>0.05) in the accuracy proportion was detected between any pairwise chatbot comparison. All chatbots demonstrated acceptable reliability (Cronbach's alpha greater than 0.7), with Claude instant having the highest reliability of 0.93. Conclusion: Chatbots exhibit acceptable consistency in responding to questions related to special needs dentistry and better accuracy in responding to true/false questions than diagnostic questions. The clinical relevance is not fully established at this stage, but it may become a useful tool in the future.

本刊更多论文

人工智能聊天机器人在特殊需求牙科中的反应对比分析。

目的：评估聊天机器人在回答特殊需求牙科相关问题时的准确性和一致性。方法对谷歌 Bard、ChatGPT 4、ChatGPT 3.5、Llama、Sage、Claude 2 100k、Claude-instant、Claude-instant-100k 和谷歌 PaLM 等九个可公开访问的聊天机器人回答一组 25 个与特殊需求牙科相关的真/假问题和 15 个基于口腔表现的综合征诊断问题的能力进行了评估。从 2023 年 11 月到 12 月，每个聊天机器人都被独立询问了三次，每次间隔三周，回答情况由牙科专业人员进行评估。Wilcoxon 精确检验用于比较聊天机器人的准确率，而 Cronbach's alpha 则用于测量聊天机器人回答的一致性。结果聊天机器人回答所有问题的平均准确率为 55%，诊断准确率为 37±6%，回答真/假问题的准确率为 67±8%。任何一对聊天机器人之间的准确率对比均未发现明显差异（P>0.05）。所有聊天机器人都表现出可接受的可靠性（Cronbach's alpha 大于 0.7），其中克劳德即时聊天机器人的可靠性最高，达到 0.93。结论聊天机器人在回答与特殊需求牙科相关的问题时表现出了可接受的一致性，在回答真/假问题时比回答诊断性问题更准确。现阶段还不能完全确定其临床相关性，但将来可能会成为一种有用的工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pediatric dentistry

自引率

0.00%

发文量