评估人工智能聊天机器人作为牙科创伤公共信息来源的有效性和可靠性。

IF 2.3 3区 医学 Q2 DENTISTRY, ORAL SURGERY & MEDICINE
Ashish J Johnson, Tarun Kumar Singh, Aakash Gupta, Hariram Sankar, Ikroop Gill, Madhav Shalini, Neeraj Mohan
{"title":"评估人工智能聊天机器人作为牙科创伤公共信息来源的有效性和可靠性。","authors":"Ashish J Johnson, Tarun Kumar Singh, Aakash Gupta, Hariram Sankar, Ikroop Gill, Madhav Shalini, Neeraj Mohan","doi":"10.1111/edt.13000","DOIUrl":null,"url":null,"abstract":"<p><strong>Aim: </strong>This study aimed to assess the validity and reliability of AI chatbots, including Bing, ChatGPT 3.5, Google Gemini, and Claude AI, in addressing frequently asked questions (FAQs) related to dental trauma.</p><p><strong>Methodology: </strong>A set of 30 FAQs was initially formulated by collecting responses from four AI chatbots. A panel comprising expert endodontists and maxillofacial surgeons then refined these to a final selection of 20 questions. Each question was entered into each chatbot three times, generating a total of 240 responses. These responses were evaluated using the Global Quality Score (GQS) on a 5-point Likert scale (5: strongly agree; 4: agree; 3: neutral; 2: disagree; 1: strongly disagree). Any disagreements in scoring were resolved through evidence-based discussions. The validity of the responses was determined by categorizing them as valid or invalid based on two thresholds: a low threshold (scores of ≥ 4 for all three responses) and a high threshold (scores of 5 for all three responses). A chi-squared test was used to compare the validity of the responses between the chatbots. Cronbach's alpha was calculated to assess the reliability by evaluating the consistency of repeated responses from each chatbot.</p><p><strong>Conclusion: </strong>The results indicate that the Claude AI chatbot demonstrated superior validity and reliability compared to ChatGPT and Google Gemini, whereas Bing was found to be less reliable. These findings underscore the need for authorities to establish strict guidelines to ensure the accuracy of medical information provided by AI chatbots.</p>","PeriodicalId":55180,"journal":{"name":"Dental Traumatology","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of validity and reliability of AI Chatbots as public sources of information on dental trauma.\",\"authors\":\"Ashish J Johnson, Tarun Kumar Singh, Aakash Gupta, Hariram Sankar, Ikroop Gill, Madhav Shalini, Neeraj Mohan\",\"doi\":\"10.1111/edt.13000\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Aim: </strong>This study aimed to assess the validity and reliability of AI chatbots, including Bing, ChatGPT 3.5, Google Gemini, and Claude AI, in addressing frequently asked questions (FAQs) related to dental trauma.</p><p><strong>Methodology: </strong>A set of 30 FAQs was initially formulated by collecting responses from four AI chatbots. A panel comprising expert endodontists and maxillofacial surgeons then refined these to a final selection of 20 questions. Each question was entered into each chatbot three times, generating a total of 240 responses. These responses were evaluated using the Global Quality Score (GQS) on a 5-point Likert scale (5: strongly agree; 4: agree; 3: neutral; 2: disagree; 1: strongly disagree). Any disagreements in scoring were resolved through evidence-based discussions. The validity of the responses was determined by categorizing them as valid or invalid based on two thresholds: a low threshold (scores of ≥ 4 for all three responses) and a high threshold (scores of 5 for all three responses). A chi-squared test was used to compare the validity of the responses between the chatbots. Cronbach's alpha was calculated to assess the reliability by evaluating the consistency of repeated responses from each chatbot.</p><p><strong>Conclusion: </strong>The results indicate that the Claude AI chatbot demonstrated superior validity and reliability compared to ChatGPT and Google Gemini, whereas Bing was found to be less reliable. These findings underscore the need for authorities to establish strict guidelines to ensure the accuracy of medical information provided by AI chatbots.</p>\",\"PeriodicalId\":55180,\"journal\":{\"name\":\"Dental Traumatology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Dental Traumatology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1111/edt.13000\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dental Traumatology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/edt.13000","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

摘要

目的:本研究旨在评估人工智能聊天机器人(包括 Bing、ChatGPT 3.5、Google Gemini 和 Claude AI)在解决牙科创伤相关常见问题(FAQ)时的有效性和可靠性:通过收集四个人工智能聊天机器人的回复,初步拟定了一套 30 个常见问题。然后,由牙髓病学专家和颌面外科医生组成的小组对这些问题进行了改进,最终选出了 20 个问题。每个问题在每个聊天机器人中输入三次,共产生 240 个回复。这些回复采用全球质量得分(GQS)进行评估,采用 5 点李克特量表(5:非常同意;4:同意;3:中立;2:不同意;1:非常不同意)。评分中的任何分歧均通过基于证据的讨论来解决。根据两个阈值:低阈值(所有三个回答的得分均≥ 4 分)和高阈值(所有三个回答的得分均为 5 分),将回答分为有效和无效两类,从而确定回答的有效性。使用卡方检验比较聊天机器人之间回复的有效性。计算了 Cronbach's alpha,通过评估每个聊天机器人重复回答的一致性来评估可靠性:结果表明,与 ChatGPT 和谷歌双子星相比,克劳德人工智能聊天机器人的有效性和可靠性更高,而必应的可靠性较低。这些研究结果表明,有关部门有必要制定严格的指导方针,以确保人工智能聊天机器人提供的医疗信息的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluation of validity and reliability of AI Chatbots as public sources of information on dental trauma.

Aim: This study aimed to assess the validity and reliability of AI chatbots, including Bing, ChatGPT 3.5, Google Gemini, and Claude AI, in addressing frequently asked questions (FAQs) related to dental trauma.

Methodology: A set of 30 FAQs was initially formulated by collecting responses from four AI chatbots. A panel comprising expert endodontists and maxillofacial surgeons then refined these to a final selection of 20 questions. Each question was entered into each chatbot three times, generating a total of 240 responses. These responses were evaluated using the Global Quality Score (GQS) on a 5-point Likert scale (5: strongly agree; 4: agree; 3: neutral; 2: disagree; 1: strongly disagree). Any disagreements in scoring were resolved through evidence-based discussions. The validity of the responses was determined by categorizing them as valid or invalid based on two thresholds: a low threshold (scores of ≥ 4 for all three responses) and a high threshold (scores of 5 for all three responses). A chi-squared test was used to compare the validity of the responses between the chatbots. Cronbach's alpha was calculated to assess the reliability by evaluating the consistency of repeated responses from each chatbot.

Conclusion: The results indicate that the Claude AI chatbot demonstrated superior validity and reliability compared to ChatGPT and Google Gemini, whereas Bing was found to be less reliable. These findings underscore the need for authorities to establish strict guidelines to ensure the accuracy of medical information provided by AI chatbots.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Dental Traumatology
Dental Traumatology 医学-牙科与口腔外科
CiteScore
6.40
自引率
32.00%
发文量
85
审稿时长
6-12 weeks
期刊介绍: Dental Traumatology is an international journal that aims to convey scientific and clinical progress in all areas related to adult and pediatric dental traumatology. This includes the following topics: - Epidemiology, Social Aspects, Education, Diagnostics - Esthetics / Prosthetics/ Restorative - Evidence Based Traumatology & Study Design - Oral & Maxillofacial Surgery/Transplant/Implant - Pediatrics and Orthodontics - Prevention and Sports Dentistry - Endodontics and Periodontal Aspects The journal"s aim is to promote communication among clinicians, educators, researchers, and others interested in the field of dental traumatology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信