A comparison of quality and readability of Artificial Intelligence chatbots in triage for head and neck cancer

IF 1.7 4区 医学 Q2 OTORHINOLARYNGOLOGY
Taylor Kring , Soumil Prasad , Supriya Dadi , Eric Sokhn , Elizabeth Franzmann
{"title":"A comparison of quality and readability of Artificial Intelligence chatbots in triage for head and neck cancer","authors":"Taylor Kring ,&nbsp;Soumil Prasad ,&nbsp;Supriya Dadi ,&nbsp;Eric Sokhn ,&nbsp;Elizabeth Franzmann","doi":"10.1016/j.amjoto.2025.104710","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>Head and neck cancers (HNCs) are a significant global health concern, contributing to substantial morbidity and mortality. AI-powered chatbots such as ChatGPT, Google Gemini, Microsoft Copilot, and Open Evidence are increasingly used by patients seeking health information. While these tools provide immediate access to medical content, concerns remain regarding their reliability, readability, and potential impact on patient outcomes.</div></div><div><h3>Methods</h3><div>Responses to 25 patient-like HNC symptom queries were assessed using four leading AI platforms: ChatGPT, Google Gemini, Microsoft Copilot, and Open Evidence. Responses were evaluated using modified DISCERN criteria for quality and SMOG scoring for readability, with ANOVA and post hoc analysis conducted afterward.</div></div><div><h3>Results</h3><div>Microsoft Copilot achieved the highest mean DISCERN score of 41.40 (95 % CI: 40.31 to 42.49) and the lowest mean SMOG reading levels of 12.56 (95 % CI: 11.82 to 13.31), outperforming ChatGPT, Google Gemini, and Open Evidence in overall quality and accessibility (p &lt; .001). Open Evidence scored lowest in both quality averaging 30.52 (95 % CI: 27.52 to 33.52) and readability of 17.49 (95 % CI: 16.66 to 18.31), reflecting a graduate reading level.</div></div><div><h3>Conclusion</h3><div>Significant variability exists in the readability and quality of AI-generated responses to HNC-related queries, highlighting the need for platform-specific validation and oversight to ensure accurate, patient-centered communication.</div></div><div><h3>Level of evidence</h3><div>Our study is a cross-sectional analysis that evaluates chatbot responses using established grading tools. This aligns best with level 4 evidence.</div></div>","PeriodicalId":7591,"journal":{"name":"American Journal of Otolaryngology","volume":"46 5","pages":"Article 104710"},"PeriodicalIF":1.7000,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Otolaryngology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0196070925001139","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

Head and neck cancers (HNCs) are a significant global health concern, contributing to substantial morbidity and mortality. AI-powered chatbots such as ChatGPT, Google Gemini, Microsoft Copilot, and Open Evidence are increasingly used by patients seeking health information. While these tools provide immediate access to medical content, concerns remain regarding their reliability, readability, and potential impact on patient outcomes.

Methods

Responses to 25 patient-like HNC symptom queries were assessed using four leading AI platforms: ChatGPT, Google Gemini, Microsoft Copilot, and Open Evidence. Responses were evaluated using modified DISCERN criteria for quality and SMOG scoring for readability, with ANOVA and post hoc analysis conducted afterward.

Results

Microsoft Copilot achieved the highest mean DISCERN score of 41.40 (95 % CI: 40.31 to 42.49) and the lowest mean SMOG reading levels of 12.56 (95 % CI: 11.82 to 13.31), outperforming ChatGPT, Google Gemini, and Open Evidence in overall quality and accessibility (p < .001). Open Evidence scored lowest in both quality averaging 30.52 (95 % CI: 27.52 to 33.52) and readability of 17.49 (95 % CI: 16.66 to 18.31), reflecting a graduate reading level.

Conclusion

Significant variability exists in the readability and quality of AI-generated responses to HNC-related queries, highlighting the need for platform-specific validation and oversight to ensure accurate, patient-centered communication.

Level of evidence

Our study is a cross-sectional analysis that evaluates chatbot responses using established grading tools. This aligns best with level 4 evidence.
人工智能聊天机器人在头颈癌分诊中的质量和可读性比较
目的头颈癌(HNCs)是一个重要的全球健康问题,导致大量发病率和死亡率。ChatGPT、谷歌Gemini、Microsoft Copilot和Open Evidence等人工智能聊天机器人越来越多地被患者用于寻求健康信息。虽然这些工具提供了对医疗内容的即时访问,但人们仍然担心它们的可靠性、可读性和对患者结果的潜在影响。方法采用ChatGPT、b谷歌Gemini、Microsoft Copilot和Open Evidence四种领先的人工智能平台,对25个类似患者的HNC症状查询进行评估。使用修改后的DISCERN质量标准和烟雾可读性评分对反馈进行评估,随后进行方差分析和事后分析。结果microsoft Copilot的平均识别率最高为41.40分(95% CI: 40.31 ~ 42.49),平均识别率最低为12.56分(95% CI: 11.82 ~ 13.31),在整体质量和可及性方面优于ChatGPT、谷歌Gemini和Open Evidence (p <;措施)。Open Evidence在质量(平均30.52 (95% CI: 27.52至33.52)和可读性(17.49 (95% CI: 16.66至18.31))方面得分最低,反映了研究生的阅读水平。结论:人工智能生成的hnc相关查询的回复的可读性和质量存在显著差异,强调需要针对特定平台进行验证和监督,以确保准确、以患者为中心的沟通。证据水平我们的研究是一个横断面分析,使用既定的评分工具来评估聊天机器人的反应。这与4级证据最吻合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
American Journal of Otolaryngology
American Journal of Otolaryngology 医学-耳鼻喉科学
CiteScore
4.40
自引率
4.00%
发文量
378
审稿时长
41 days
期刊介绍: Be fully informed about developments in otology, neurotology, audiology, rhinology, allergy, laryngology, speech science, bronchoesophagology, facial plastic surgery, and head and neck surgery. Featured sections include original contributions, grand rounds, current reviews, case reports and socioeconomics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信