用于分享3d打印正畸器具见解的人工智能聊天机器人(ChatGPT-4和Copilot)的性能评估:一项横断面研究

IF 1.8 Q2 DENTISTRY, ORAL SURGERY & MEDICINE
Asma Muhammad Yousuf, Fizzah Ikram, Munnal Gulzar, Rashna Hoshang Sukhia, Mubassar Fida
{"title":"用于分享3d打印正畸器具见解的人工智能聊天机器人(ChatGPT-4和Copilot)的性能评估:一项横断面研究","authors":"Asma Muhammad Yousuf,&nbsp;Fizzah Ikram,&nbsp;Munnal Gulzar,&nbsp;Rashna Hoshang Sukhia,&nbsp;Mubassar Fida","doi":"10.1016/j.ortho.2025.100992","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>To evaluate and compare the performance of OpenAI's ChatGPT-4 and Microsoft Copilot in providing information on 3D-printed orthodontic appliances, with a focus on the accuracy, completeness of the content, and response generation time.</div></div><div><h3>Methods</h3><div>This cross-sectional study proceeded in five stages. Initially, three orthodontists created a total of 125 questions concerning 3D printed orthodontic appliances of which 105 questions were finalized to be incorporated into the study by a panel of senior orthodontists. These questions were subsequently organized into 15 distinct domains. Both chatbots were presented with the questions under consistent conditions, using the same laptop and internet setup. A stopwatch was used to record response times. The responses were anonymized and evaluated by seven orthodontists with extensive experience, who scored accuracy and completeness based on standardized tools. Through discussion, evaluators reached a consensus on each score, ensuring reliability.</div></div><div><h3>Results</h3><div>Spearman's correlation revealed a moderate to strong negative correlation between accuracy and completeness for both chatbots (<em>p<!--> </em>≤<!--> <!-->0.001). The negative correlation observed between accuracy and completeness scores, particularly prominent in Copilot, indicates a trade-off between these qualities in some responses. Mann-Whitney U tests confirmed significant differences in accuracy and completeness between the chatbots (<em>p<!--> </em>≤<!--> <!-->0.001), though response time differences were not statistically significant (<em>p</em> <em>=</em> <em>0.204</em>). Cohen's Kappa results implied little to no consistency between the two models on the assessed parameters (<em>p<!--> <!-->&gt;</em> <!-->0.05).</div></div><div><h3>Conclusion</h3><div>ChatGPT-4 outperformed Microsoft Copilot in accuracy and completeness, providing more precise and comprehensive information on 3D-printed orthodontic appliances demonstrating a greater ability to handle complex, and detailed requests in this area.</div></div>","PeriodicalId":45449,"journal":{"name":"International Orthodontics","volume":"23 3","pages":"Article 100992"},"PeriodicalIF":1.8000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance assessment of artificial intelligence chatbots (ChatGPT-4 and Copilot) for sharing insights on 3D-printed orthodontic appliances: A cross-sectional study\",\"authors\":\"Asma Muhammad Yousuf,&nbsp;Fizzah Ikram,&nbsp;Munnal Gulzar,&nbsp;Rashna Hoshang Sukhia,&nbsp;Mubassar Fida\",\"doi\":\"10.1016/j.ortho.2025.100992\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective</h3><div>To evaluate and compare the performance of OpenAI's ChatGPT-4 and Microsoft Copilot in providing information on 3D-printed orthodontic appliances, with a focus on the accuracy, completeness of the content, and response generation time.</div></div><div><h3>Methods</h3><div>This cross-sectional study proceeded in five stages. Initially, three orthodontists created a total of 125 questions concerning 3D printed orthodontic appliances of which 105 questions were finalized to be incorporated into the study by a panel of senior orthodontists. These questions were subsequently organized into 15 distinct domains. Both chatbots were presented with the questions under consistent conditions, using the same laptop and internet setup. A stopwatch was used to record response times. The responses were anonymized and evaluated by seven orthodontists with extensive experience, who scored accuracy and completeness based on standardized tools. Through discussion, evaluators reached a consensus on each score, ensuring reliability.</div></div><div><h3>Results</h3><div>Spearman's correlation revealed a moderate to strong negative correlation between accuracy and completeness for both chatbots (<em>p<!--> </em>≤<!--> <!-->0.001). The negative correlation observed between accuracy and completeness scores, particularly prominent in Copilot, indicates a trade-off between these qualities in some responses. Mann-Whitney U tests confirmed significant differences in accuracy and completeness between the chatbots (<em>p<!--> </em>≤<!--> <!-->0.001), though response time differences were not statistically significant (<em>p</em> <em>=</em> <em>0.204</em>). Cohen's Kappa results implied little to no consistency between the two models on the assessed parameters (<em>p<!--> <!-->&gt;</em> <!-->0.05).</div></div><div><h3>Conclusion</h3><div>ChatGPT-4 outperformed Microsoft Copilot in accuracy and completeness, providing more precise and comprehensive information on 3D-printed orthodontic appliances demonstrating a greater ability to handle complex, and detailed requests in this area.</div></div>\",\"PeriodicalId\":45449,\"journal\":{\"name\":\"International Orthodontics\",\"volume\":\"23 3\",\"pages\":\"Article 100992\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-02-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Orthodontics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1761722725000270\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Orthodontics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1761722725000270","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

摘要

目的评价和比较OpenAI的ChatGPT-4和Microsoft Copilot在3d打印正畸矫治器信息提供方面的性能,重点关注内容的准确性、完整性和响应生成时间。方法横断面研究分五个阶段进行。最初,三位正畸医生创建了125个关于3D打印正畸器具的问题,其中105个问题最终由高级正畸医生小组纳入研究。这些问题后来被分成15个不同的领域。两个聊天机器人在相同的条件下,使用相同的笔记本电脑和互联网设置来回答问题。用秒表记录反应时间。这些回答是匿名的,并由7名经验丰富的正畸医生进行评估,他们根据标准化工具对准确性和完整性进行评分。通过讨论,评估者对每个分数达成了共识,确保了可靠性。结果spearman相关显示两种聊天机器人的准确性和完整性之间存在中度到强烈的负相关(p≤0.001)。在准确性和完整性得分之间观察到的负相关,特别是在Copilot中突出,表明在某些回答中这些品质之间存在权衡。Mann-Whitney U检验证实了聊天机器人之间在准确性和完整性方面的显著差异(p≤0.001),尽管响应时间差异无统计学意义(p = 0.204)。Cohen的Kappa结果表明,两种模型在评估参数上几乎没有一致性(p >;0.05)。结论chatgpt -4在准确性和完整性方面优于Microsoft Copilot,提供了更精确和全面的3d打印正畸矫治器信息,展示了处理该领域复杂和详细要求的更强能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Performance assessment of artificial intelligence chatbots (ChatGPT-4 and Copilot) for sharing insights on 3D-printed orthodontic appliances: A cross-sectional study

Objective

To evaluate and compare the performance of OpenAI's ChatGPT-4 and Microsoft Copilot in providing information on 3D-printed orthodontic appliances, with a focus on the accuracy, completeness of the content, and response generation time.

Methods

This cross-sectional study proceeded in five stages. Initially, three orthodontists created a total of 125 questions concerning 3D printed orthodontic appliances of which 105 questions were finalized to be incorporated into the study by a panel of senior orthodontists. These questions were subsequently organized into 15 distinct domains. Both chatbots were presented with the questions under consistent conditions, using the same laptop and internet setup. A stopwatch was used to record response times. The responses were anonymized and evaluated by seven orthodontists with extensive experience, who scored accuracy and completeness based on standardized tools. Through discussion, evaluators reached a consensus on each score, ensuring reliability.

Results

Spearman's correlation revealed a moderate to strong negative correlation between accuracy and completeness for both chatbots (p  0.001). The negative correlation observed between accuracy and completeness scores, particularly prominent in Copilot, indicates a trade-off between these qualities in some responses. Mann-Whitney U tests confirmed significant differences in accuracy and completeness between the chatbots (p  0.001), though response time differences were not statistically significant (p = 0.204). Cohen's Kappa results implied little to no consistency between the two models on the assessed parameters (p > 0.05).

Conclusion

ChatGPT-4 outperformed Microsoft Copilot in accuracy and completeness, providing more precise and comprehensive information on 3D-printed orthodontic appliances demonstrating a greater ability to handle complex, and detailed requests in this area.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Orthodontics
International Orthodontics DENTISTRY, ORAL SURGERY & MEDICINE-
CiteScore
2.50
自引率
13.30%
发文量
71
审稿时长
26 days
期刊介绍: Une revue de référence dans le domaine de orthodontie et des disciplines frontières Your reference in dentofacial orthopedics International Orthodontics adresse aux orthodontistes, aux dentistes, aux stomatologistes, aux chirurgiens maxillo-faciaux et aux plasticiens de la face, ainsi quà leurs assistant(e)s. International Orthodontics is addressed to orthodontists, dentists, stomatologists, maxillofacial surgeons and facial plastic surgeons, as well as their assistants.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信