评估人工智能聊天机器人提供结节病相关信息的可靠性和质量。

IF 2.4 4区 医学 Q2 HEALTH CARE SCIENCES & SERVICES
Nur Aleyna Yetkin, Burcu Baran, Bilal Rabahoğlu, Nuri Tutar, İnci Gülmez
{"title":"评估人工智能聊天机器人提供结节病相关信息的可靠性和质量。","authors":"Nur Aleyna Yetkin, Burcu Baran, Bilal Rabahoğlu, Nuri Tutar, İnci Gülmez","doi":"10.3390/healthcare13111344","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background and Objectives:</b> Artificial intelligence (AI) chatbots are increasingly employed for the dissemination of health information; however, apprehensions regarding their accuracy and reliability remain. The intricacy of sarcoidosis may lead to misinformation and omissions that affect patient comprehension. This study assessed the usability of AI-generated information on sarcoidosis by evaluating the quality, reliability, readability, understandability, and actionability of chatbot responses to patient-centered queries. <b>Methods</b>: This cross-sectional evaluation included 11 AI chatbots comprising both general-purpose and retrieval-augmented tools. Four sarcoidosis-related queries derived from Google Trends were submitted to each chatbot under standardized conditions. Responses were independently evaluated by four blinded pulmonology experts using DISCERN, the Patient Education Materials Assessment Tool-Printable (PEMAT-P), and Flesch-Kincaid readability metrics. A Web Resource Rating (WRR) score was also calculated. Inter-rater reliability was assessed using intraclass correlation coefficients (ICCs). <b>Results</b>: Retrieval-augmented models such as ChatGPT-4o Deep Research, Perplexity Research, and Grok3 Deep Search outperformed general-purpose chatbots across the DISCERN, PEMAT-P, and WRR metrics. However, these high-performing models also produced text at significantly higher reading levels (Flesch-Kincaid Grade Level > 16), reducing accessibility. Actionability scores were consistently lower than understandability scores across all models. The ICCs exceeded 0.80 for all evaluation domains, indicating excellent inter-rater reliability. <b>Conclusions</b>: Although some AI chatbots can generate accurate and well-structured responses to sarcoidosis-related questions, their limited readability and low actionability present barriers for effective patient education. Optimization strategies, such as prompt refinement, health literacy adaptation, and domain-specific model development, are required to improve the utility of AI chatbots in complex disease communication.</p>","PeriodicalId":12977,"journal":{"name":"Healthcare","volume":"13 11","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12154112/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluating the Reliability and Quality of Sarcoidosis-Related Information Provided by AI Chatbots.\",\"authors\":\"Nur Aleyna Yetkin, Burcu Baran, Bilal Rabahoğlu, Nuri Tutar, İnci Gülmez\",\"doi\":\"10.3390/healthcare13111344\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Background and Objectives:</b> Artificial intelligence (AI) chatbots are increasingly employed for the dissemination of health information; however, apprehensions regarding their accuracy and reliability remain. The intricacy of sarcoidosis may lead to misinformation and omissions that affect patient comprehension. This study assessed the usability of AI-generated information on sarcoidosis by evaluating the quality, reliability, readability, understandability, and actionability of chatbot responses to patient-centered queries. <b>Methods</b>: This cross-sectional evaluation included 11 AI chatbots comprising both general-purpose and retrieval-augmented tools. Four sarcoidosis-related queries derived from Google Trends were submitted to each chatbot under standardized conditions. Responses were independently evaluated by four blinded pulmonology experts using DISCERN, the Patient Education Materials Assessment Tool-Printable (PEMAT-P), and Flesch-Kincaid readability metrics. A Web Resource Rating (WRR) score was also calculated. Inter-rater reliability was assessed using intraclass correlation coefficients (ICCs). <b>Results</b>: Retrieval-augmented models such as ChatGPT-4o Deep Research, Perplexity Research, and Grok3 Deep Search outperformed general-purpose chatbots across the DISCERN, PEMAT-P, and WRR metrics. However, these high-performing models also produced text at significantly higher reading levels (Flesch-Kincaid Grade Level > 16), reducing accessibility. Actionability scores were consistently lower than understandability scores across all models. The ICCs exceeded 0.80 for all evaluation domains, indicating excellent inter-rater reliability. <b>Conclusions</b>: Although some AI chatbots can generate accurate and well-structured responses to sarcoidosis-related questions, their limited readability and low actionability present barriers for effective patient education. Optimization strategies, such as prompt refinement, health literacy adaptation, and domain-specific model development, are required to improve the utility of AI chatbots in complex disease communication.</p>\",\"PeriodicalId\":12977,\"journal\":{\"name\":\"Healthcare\",\"volume\":\"13 11\",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12154112/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Healthcare\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3390/healthcare13111344\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/healthcare13111344","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

背景与目的:人工智能(AI)聊天机器人越来越多地用于健康信息的传播;然而,对其准确性和可靠性的担忧仍然存在。结节病的复杂性可能导致影响患者理解的错误信息和遗漏。本研究通过评估聊天机器人对以患者为中心的查询的响应的质量、可靠性、可读性、可理解性和可操作性,评估了人工智能生成结节病信息的可用性。方法:这项横断面评估包括11个人工智能聊天机器人,包括通用和检索增强工具。在标准化条件下,从谷歌趋势中获得的四个结节病相关查询被提交给每个聊天机器人。应答由四位盲法肺科专家使用DISCERN、患者教育材料评估工具-可打印(PEMAT-P)和Flesch-Kincaid可读性指标独立评估。还计算了Web资源评级(WRR)评分。采用类内相关系数(ICCs)评估组间信度。结果:检索增强模型,如chatgpt - 40深度研究、Perplexity研究和Grok3深度搜索,在DISCERN、PEMAT-P和WRR指标上都优于通用聊天机器人。然而,这些高性能模型也产生了明显更高阅读水平的文本(Flesch-Kincaid Grade Level > 16),降低了可访问性。在所有模型中,可操作性得分始终低于可理解性得分。所有评价域的ICCs均超过0.80,表明评分者间信度极佳。结论:尽管一些人工智能聊天机器人可以对结节病相关问题做出准确且结构良好的回答,但其有限的可读性和低可操作性为有效的患者教育带来了障碍。为了提高人工智能聊天机器人在复杂疾病交流中的效用,需要优化策略,如及时改进、健康素养适应和特定领域的模型开发。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluating the Reliability and Quality of Sarcoidosis-Related Information Provided by AI Chatbots.

Background and Objectives: Artificial intelligence (AI) chatbots are increasingly employed for the dissemination of health information; however, apprehensions regarding their accuracy and reliability remain. The intricacy of sarcoidosis may lead to misinformation and omissions that affect patient comprehension. This study assessed the usability of AI-generated information on sarcoidosis by evaluating the quality, reliability, readability, understandability, and actionability of chatbot responses to patient-centered queries. Methods: This cross-sectional evaluation included 11 AI chatbots comprising both general-purpose and retrieval-augmented tools. Four sarcoidosis-related queries derived from Google Trends were submitted to each chatbot under standardized conditions. Responses were independently evaluated by four blinded pulmonology experts using DISCERN, the Patient Education Materials Assessment Tool-Printable (PEMAT-P), and Flesch-Kincaid readability metrics. A Web Resource Rating (WRR) score was also calculated. Inter-rater reliability was assessed using intraclass correlation coefficients (ICCs). Results: Retrieval-augmented models such as ChatGPT-4o Deep Research, Perplexity Research, and Grok3 Deep Search outperformed general-purpose chatbots across the DISCERN, PEMAT-P, and WRR metrics. However, these high-performing models also produced text at significantly higher reading levels (Flesch-Kincaid Grade Level > 16), reducing accessibility. Actionability scores were consistently lower than understandability scores across all models. The ICCs exceeded 0.80 for all evaluation domains, indicating excellent inter-rater reliability. Conclusions: Although some AI chatbots can generate accurate and well-structured responses to sarcoidosis-related questions, their limited readability and low actionability present barriers for effective patient education. Optimization strategies, such as prompt refinement, health literacy adaptation, and domain-specific model development, are required to improve the utility of AI chatbots in complex disease communication.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Healthcare
Healthcare Medicine-Health Policy
CiteScore
3.50
自引率
7.10%
发文量
0
审稿时长
47 days
期刊介绍: Healthcare (ISSN 2227-9032) is an international, peer-reviewed, open access journal (free for readers), which publishes original theoretical and empirical work in the interdisciplinary area of all aspects of medicine and health care research. Healthcare publishes Original Research Articles, Reviews, Case Reports, Research Notes and Short Communications. We encourage researchers to publish their experimental and theoretical results in as much detail as possible. For theoretical papers, full details of proofs must be provided so that the results can be checked; for experimental papers, full experimental details must be provided so that the results can be reproduced. Additionally, electronic files or software regarding the full details of the calculations, experimental procedure, etc., can be deposited along with the publication as “Supplementary Material”.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信