Quality of Information Provided by Artificial Intelligence Chatbots Surrounding the Management of Vestibular Schwannomas: A Comparative Analysis Between ChatGPT-4 and Claude 2.

IF 1.9 3区 医学 Q3 CLINICAL NEUROLOGY
Otology & Neurotology Pub Date : 2025-04-01 Epub Date: 2025-02-04 DOI:10.1097/MAO.0000000000004410
Daniele Borsetto, Egidio Sia, Patrick Axon, Neil Donnelly, James R Tysome, Lukas Anschuetz, Daniele Bernardeschi, Vincenzo Capriotti, Per Caye-Thomasen, Niels Cramer West, Isaac D Erbele, Sebastiano Franchella, Annalisa Gatto, Jeanette Hess-Erga, Henricus P M Kunst, John P Marinelli, Richard Mannion, Benedict Panizza, Franco Trabalzini, Rupert Obholzer, Luigi Angelo Vaira, Jerry Polesel, Fabiola Giudici, Matthew L Carlson, Giancarlo Tirelli, Paolo Boscolo-Rizzo
{"title":"Quality of Information Provided by Artificial Intelligence Chatbots Surrounding the Management of Vestibular Schwannomas: A Comparative Analysis Between ChatGPT-4 and Claude 2.","authors":"Daniele Borsetto, Egidio Sia, Patrick Axon, Neil Donnelly, James R Tysome, Lukas Anschuetz, Daniele Bernardeschi, Vincenzo Capriotti, Per Caye-Thomasen, Niels Cramer West, Isaac D Erbele, Sebastiano Franchella, Annalisa Gatto, Jeanette Hess-Erga, Henricus P M Kunst, John P Marinelli, Richard Mannion, Benedict Panizza, Franco Trabalzini, Rupert Obholzer, Luigi Angelo Vaira, Jerry Polesel, Fabiola Giudici, Matthew L Carlson, Giancarlo Tirelli, Paolo Boscolo-Rizzo","doi":"10.1097/MAO.0000000000004410","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To examine the quality of information provided by artificial intelligence platforms ChatGPT-4 and Claude 2 surrounding the management of vestibular schwannomas.</p><p><strong>Study design: </strong>Cross-sectional.</p><p><strong>Setting: </strong>Skull base surgeons were involved from different centers and countries.</p><p><strong>Intervention: </strong>Thirty-six questions regarding vestibular schwannoma management were tested. Artificial intelligence responses were subsequently evaluated by 19 lateral skull base surgeons using the Quality Assessment of Medical Artificial Intelligence (QAMAI) questionnaire, assessing \"Accuracy,\" \"Clarity,\" \"Relevance,\" \"Completeness,\" \"Sources,\" and \"Usefulness.\"</p><p><strong>Main outcome measure: </strong>The scores of the answers from both chatbots were collected and analyzed using the Student t test. Analysis of responses grouped by stakeholders was performed with McNemar test. Stuart-Maxwell test was used to compare reading level among chatbots. Intraclass correlation coefficient was calculated.</p><p><strong>Results: </strong>ChatGPT-4 demonstrated significantly improved quality over Claude 2 in 14 of 36 (38.9%) questions, whereas higher-quality scores for Claude 2 were only observed in 2 (5.6%) answers. Chatbots exhibited variation across the dimensions of \"Accuracy,\" \"Clarity,\" \"Completeness,\" \"Relevance,\" and \"Usefulness,\" with ChatGPT-4 demonstrating a statistically significant superior performance. However, no statistically significant difference was found in the assessment of \"Sources.\" Additionally, ChatGPT-4 provided information at a significant lower reading grade level.</p><p><strong>Conclusions: </strong>Artificial intelligence platforms failed to consistently provide accurate information surrounding the management of vestibular schwannoma, although ChatGPT-4 achieved significantly higher scores in most analyzed parameters. These findings demonstrate the potential for significant misinformation for patients seeking information through these platforms.</p>","PeriodicalId":19732,"journal":{"name":"Otology & Neurotology","volume":" ","pages":"432-436"},"PeriodicalIF":1.9000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Otology & Neurotology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/MAO.0000000000004410","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/4 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: To examine the quality of information provided by artificial intelligence platforms ChatGPT-4 and Claude 2 surrounding the management of vestibular schwannomas.

Study design: Cross-sectional.

Setting: Skull base surgeons were involved from different centers and countries.

Intervention: Thirty-six questions regarding vestibular schwannoma management were tested. Artificial intelligence responses were subsequently evaluated by 19 lateral skull base surgeons using the Quality Assessment of Medical Artificial Intelligence (QAMAI) questionnaire, assessing "Accuracy," "Clarity," "Relevance," "Completeness," "Sources," and "Usefulness."

Main outcome measure: The scores of the answers from both chatbots were collected and analyzed using the Student t test. Analysis of responses grouped by stakeholders was performed with McNemar test. Stuart-Maxwell test was used to compare reading level among chatbots. Intraclass correlation coefficient was calculated.

Results: ChatGPT-4 demonstrated significantly improved quality over Claude 2 in 14 of 36 (38.9%) questions, whereas higher-quality scores for Claude 2 were only observed in 2 (5.6%) answers. Chatbots exhibited variation across the dimensions of "Accuracy," "Clarity," "Completeness," "Relevance," and "Usefulness," with ChatGPT-4 demonstrating a statistically significant superior performance. However, no statistically significant difference was found in the assessment of "Sources." Additionally, ChatGPT-4 provided information at a significant lower reading grade level.

Conclusions: Artificial intelligence platforms failed to consistently provide accurate information surrounding the management of vestibular schwannoma, although ChatGPT-4 achieved significantly higher scores in most analyzed parameters. These findings demonstrate the potential for significant misinformation for patients seeking information through these platforms.

人工智能聊天机器人在前庭神经鞘瘤治疗中的信息质量:ChatGPT-4和Claude 2的比较分析
目的:探讨人工智能平台ChatGPT-4和Claude 2在前庭神经鞘瘤治疗中的信息质量。研究设计:横断面。背景:颅底外科医生来自不同的中心和国家。干预:测试了关于前庭神经鞘瘤处理的36个问题。随后,19名侧颅底外科医生使用医学人工智能质量评估(QAMAI)问卷对人工智能反应进行评估,评估“准确性”、“清晰度”、“相关性”、“完整性”、“来源”和“有用性”。“主要结果测量:收集两个聊天机器人的答案得分,并使用学生t检验进行分析。采用McNemar测试对利益相关者分组的反应进行分析。采用Stuart-Maxwell测试比较聊天机器人的阅读水平。计算类内相关系数。结果:ChatGPT-4在36个问题中有14个(38.9%)的质量明显高于Claude 2,而Claude 2的高质量分数仅在2个(5.6%)的答案中被观察到。聊天机器人在“准确性”、“清晰度”、“完整性”、“相关性”和“实用性”等维度上表现出差异,而ChatGPT-4在统计上表现出显著的优势。然而,在“来源”的评估中没有发现统计学上的显著差异。此外,ChatGPT-4提供的信息明显低于阅读年级水平。结论:人工智能平台未能始终如一地提供有关前庭神经鞘瘤管理的准确信息,尽管ChatGPT-4在大多数分析参数中获得了显着更高的分数。这些发现表明,通过这些平台寻求信息的患者可能存在严重的错误信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Otology & Neurotology
Otology & Neurotology 医学-耳鼻喉科学
CiteScore
3.80
自引率
14.30%
发文量
509
审稿时长
3-6 weeks
期刊介绍: ​​​​​Otology & Neurotology publishes original articles relating to both clinical and basic science aspects of otology, neurotology, and cranial base surgery. As the foremost journal in its field, it has become the favored place for publishing the best of new science relating to the human ear and its diseases. The broadly international character of its contributing authors, editorial board, and readership provides the Journal its decidedly global perspective.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信