人工智能提示的常见初级保健诊断解释。

PRiMER (Leawood, Kan.) Pub Date : 2024-09-17 eCollection Date: 2024-01-01 DOI:10.22454/PRiMER.2024.916089
Mafaz Kattih, Max Bressler, Logan R Smith, Anthony Schinelli, Rahul Mhaskar, Karim Hanna
{"title":"人工智能提示的常见初级保健诊断解释。","authors":"Mafaz Kattih, Max Bressler, Logan R Smith, Anthony Schinelli, Rahul Mhaskar, Karim Hanna","doi":"10.22454/PRiMER.2024.916089","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI)-generated explanations about medical topics may be clearer and more accessible than traditional evidence-based sources, enhancing patient understanding and autonomy. We evaluated different AI explanations for patients about common diagnoses to aid in patient care.</p><p><strong>Methods: </strong>We prompted ChatGPT 3.5, Google Bard, HuggingChat, and Claude 2 separately to generate a short patient education paragraph about seven common diagnoses. We used the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) to evaluate the readability and grade level of the responses. We used the Agency for Healthcare Research and Quality's Patient Education Materials Assessment Tool (PEMAT) grading rubric to evaluate the understandability and actionability of responses.</p><p><strong>Results: </strong>Claude 2 demonstrated scores of FRE (67.0), FKGL (7.4), and PEMAT, 69% for understandability, and 34% for actionability. ChatGPT scores were FRE (58.5), FKGL (9.3), PEMAT (69% and 31%, respectively). Google Bard scores were FRE (50.1), FKGL (9.9), PEMAT (52% and 23%). HuggingChat scores were FRE (48.7) and FKGL (11.6), PEMAT (57% and 29%).</p><p><strong>Conclusion: </strong>Claude 2 and ChatGPT demonstrated superior readability and understandability, but practical application and patient outcomes need further exploration. This study is limited by the rapid development of these tools with newer improved models replacing the older ones. Additionally, the accuracy and clarity of AI responses is based on that of the user-generated response. The PEMAT grading rubric is also mainly used for patient information leaflets that include visual aids and may contain subjective evaluations.</p>","PeriodicalId":74494,"journal":{"name":"PRiMER (Leawood, Kan.)","volume":"8 ","pages":"51"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11578395/pdf/","citationCount":"0","resultStr":"{\"title\":\"Artificial Intelligence-Prompted Explanations of Common Primary Care Diagnoses.\",\"authors\":\"Mafaz Kattih, Max Bressler, Logan R Smith, Anthony Schinelli, Rahul Mhaskar, Karim Hanna\",\"doi\":\"10.22454/PRiMER.2024.916089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Artificial intelligence (AI)-generated explanations about medical topics may be clearer and more accessible than traditional evidence-based sources, enhancing patient understanding and autonomy. We evaluated different AI explanations for patients about common diagnoses to aid in patient care.</p><p><strong>Methods: </strong>We prompted ChatGPT 3.5, Google Bard, HuggingChat, and Claude 2 separately to generate a short patient education paragraph about seven common diagnoses. We used the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) to evaluate the readability and grade level of the responses. We used the Agency for Healthcare Research and Quality's Patient Education Materials Assessment Tool (PEMAT) grading rubric to evaluate the understandability and actionability of responses.</p><p><strong>Results: </strong>Claude 2 demonstrated scores of FRE (67.0), FKGL (7.4), and PEMAT, 69% for understandability, and 34% for actionability. ChatGPT scores were FRE (58.5), FKGL (9.3), PEMAT (69% and 31%, respectively). Google Bard scores were FRE (50.1), FKGL (9.9), PEMAT (52% and 23%). HuggingChat scores were FRE (48.7) and FKGL (11.6), PEMAT (57% and 29%).</p><p><strong>Conclusion: </strong>Claude 2 and ChatGPT demonstrated superior readability and understandability, but practical application and patient outcomes need further exploration. This study is limited by the rapid development of these tools with newer improved models replacing the older ones. Additionally, the accuracy and clarity of AI responses is based on that of the user-generated response. The PEMAT grading rubric is also mainly used for patient information leaflets that include visual aids and may contain subjective evaluations.</p>\",\"PeriodicalId\":74494,\"journal\":{\"name\":\"PRiMER (Leawood, Kan.)\",\"volume\":\"8 \",\"pages\":\"51\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11578395/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PRiMER (Leawood, Kan.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22454/PRiMER.2024.916089\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PRiMER (Leawood, Kan.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22454/PRiMER.2024.916089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景:人工智能(AI)生成的医疗主题解释可能比传统的循证来源更清晰、更易懂,从而增强患者的理解力和自主性。我们评估了为患者提供的有关常见诊断的不同人工智能解释,以帮助患者护理:我们分别让 ChatGPT 3.5、Google Bard、HuggingChat 和 Claude 2 生成一段有关七种常见诊断的简短患者教育内容。我们使用 Flesch 阅读容易度 (FRE) 和 Flesch-Kincaid 年级水平 (FKGL) 来评估回复的可读性和年级水平。我们使用医疗保健研究与质量机构的患者教育材料评估工具(PEMAT)评分标准来评估回答的可理解性和可操作性:克劳德 2 的得分分别为 FRE(67.0)、FKGL(7.4)和 PEMAT,可理解度为 69%,可操作性为 34%。ChatGPT 的得分分别为 FRE(58.5)、FKGL(9.3)和 PEMAT(69% 和 31%)。Google Bard 得分分别为 FRE(50.1)、FKGL(9.9)和 PEMAT(52% 和 23%)。HuggingChat 的得分分别为 FRE(48.7)、FKGL(11.6)和 PEMAT(57% 和 29%):结论:Claude 2 和 ChatGPT 显示出更高的可读性和可理解性,但实际应用和患者疗效还需进一步探索。这项研究受到了这些工具的快速发展的限制,因为新的改进模型取代了旧的模型。此外,人工智能回答的准确性和清晰度是基于用户生成的回答。PEMAT 评分标准也主要用于包含视觉辅助工具和可能包含主观评价的患者信息传单。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Artificial Intelligence-Prompted Explanations of Common Primary Care Diagnoses.

Background: Artificial intelligence (AI)-generated explanations about medical topics may be clearer and more accessible than traditional evidence-based sources, enhancing patient understanding and autonomy. We evaluated different AI explanations for patients about common diagnoses to aid in patient care.

Methods: We prompted ChatGPT 3.5, Google Bard, HuggingChat, and Claude 2 separately to generate a short patient education paragraph about seven common diagnoses. We used the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) to evaluate the readability and grade level of the responses. We used the Agency for Healthcare Research and Quality's Patient Education Materials Assessment Tool (PEMAT) grading rubric to evaluate the understandability and actionability of responses.

Results: Claude 2 demonstrated scores of FRE (67.0), FKGL (7.4), and PEMAT, 69% for understandability, and 34% for actionability. ChatGPT scores were FRE (58.5), FKGL (9.3), PEMAT (69% and 31%, respectively). Google Bard scores were FRE (50.1), FKGL (9.9), PEMAT (52% and 23%). HuggingChat scores were FRE (48.7) and FKGL (11.6), PEMAT (57% and 29%).

Conclusion: Claude 2 and ChatGPT demonstrated superior readability and understandability, but practical application and patient outcomes need further exploration. This study is limited by the rapid development of these tools with newer improved models replacing the older ones. Additionally, the accuracy and clarity of AI responses is based on that of the user-generated response. The PEMAT grading rubric is also mainly used for patient information leaflets that include visual aids and may contain subjective evaluations.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信