人工智能聊天机器人作为白内障手术患者教育材料的来源:ChatGPT-4 与 Google Bard 的对比。

IF 2 Q2 OPHTHALMOLOGY
Matthew Azzopardi, Benjamin Ng, Abison Logeswaran, Constantinos Loizou, Ryan Chin Taw Cheong, Prasanth Gireesh, Darren Shu Jeng Ting, Yu Jeat Chong
{"title":"人工智能聊天机器人作为白内障手术患者教育材料的来源:ChatGPT-4 与 Google Bard 的对比。","authors":"Matthew Azzopardi, Benjamin Ng, Abison Logeswaran, Constantinos Loizou, Ryan Chin Taw Cheong, Prasanth Gireesh, Darren Shu Jeng Ting, Yu Jeat Chong","doi":"10.1136/bmjophth-2024-001824","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To conduct a head-to-head comparative analysis of cataract surgery patient education material generated by Chat Generative Pre-trained Transformer (ChatGPT-4) and Google Bard.</p><p><strong>Methods and analysis: </strong>98 frequently asked questions on cataract surgery in English were taken in November 2023 from 5 trustworthy online patient information resources. 59 of these were curated (20 augmented for clarity and 39 duplicates excluded) and categorised into 3 domains: condition (n=15), preparation for surgery (n=21) and recovery after surgery (n=23). They were formulated into input prompts with 'prompt engineering'. Using the Patient Education Materials Assessment Tool-Printable (PEMAT-P) Auto-Scoring Form, four ophthalmologists independently graded ChatGPT-4 and Google Bard responses. The readability of responses was evaluated using a Flesch-Kincaid calculator. Responses were also subjectively examined for any inaccurate or harmful information.</p><p><strong>Results: </strong>Google Bard had a higher mean overall Flesch-Kincaid Level (8.02) compared with ChatGPT-4 (5.75) (p<0.001), also noted across all three domains. ChatGPT-4 had a higher overall PEMAT-P understandability score (85.8%) in comparison to Google Bard (80.9%) (p<0.001), which was also noted in the 'preparation for cataract surgery' (85.2% vs 75.7%; p<0.001) and 'recovery after cataract surgery' (86.5% vs 82.3%; p=0.004) domains. There was no statistically significant difference in overall (42.5% vs 44.2%; p=0.344) or individual domain actionability scores (p>0.10). None of the generated material contained dangerous information.</p><p><strong>Conclusion: </strong>In comparison to Google Bard, ChatGPT-4 fared better overall, scoring higher on the PEMAT-P understandability scale and exhibiting more faithfulness to the prompt engineering instruction. Since input prompts might vary from real-world patient searches, follow-up studies with patient participation are required.</p>","PeriodicalId":9286,"journal":{"name":"BMJ Open Ophthalmology","volume":"9 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11487885/pdf/","citationCount":"0","resultStr":"{\"title\":\"Artificial intelligence chatbots as sources of patient education material for cataract surgery: ChatGPT-4 versus Google Bard.\",\"authors\":\"Matthew Azzopardi, Benjamin Ng, Abison Logeswaran, Constantinos Loizou, Ryan Chin Taw Cheong, Prasanth Gireesh, Darren Shu Jeng Ting, Yu Jeat Chong\",\"doi\":\"10.1136/bmjophth-2024-001824\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>To conduct a head-to-head comparative analysis of cataract surgery patient education material generated by Chat Generative Pre-trained Transformer (ChatGPT-4) and Google Bard.</p><p><strong>Methods and analysis: </strong>98 frequently asked questions on cataract surgery in English were taken in November 2023 from 5 trustworthy online patient information resources. 59 of these were curated (20 augmented for clarity and 39 duplicates excluded) and categorised into 3 domains: condition (n=15), preparation for surgery (n=21) and recovery after surgery (n=23). They were formulated into input prompts with 'prompt engineering'. Using the Patient Education Materials Assessment Tool-Printable (PEMAT-P) Auto-Scoring Form, four ophthalmologists independently graded ChatGPT-4 and Google Bard responses. The readability of responses was evaluated using a Flesch-Kincaid calculator. Responses were also subjectively examined for any inaccurate or harmful information.</p><p><strong>Results: </strong>Google Bard had a higher mean overall Flesch-Kincaid Level (8.02) compared with ChatGPT-4 (5.75) (p<0.001), also noted across all three domains. ChatGPT-4 had a higher overall PEMAT-P understandability score (85.8%) in comparison to Google Bard (80.9%) (p<0.001), which was also noted in the 'preparation for cataract surgery' (85.2% vs 75.7%; p<0.001) and 'recovery after cataract surgery' (86.5% vs 82.3%; p=0.004) domains. There was no statistically significant difference in overall (42.5% vs 44.2%; p=0.344) or individual domain actionability scores (p>0.10). None of the generated material contained dangerous information.</p><p><strong>Conclusion: </strong>In comparison to Google Bard, ChatGPT-4 fared better overall, scoring higher on the PEMAT-P understandability scale and exhibiting more faithfulness to the prompt engineering instruction. Since input prompts might vary from real-world patient searches, follow-up studies with patient participation are required.</p>\",\"PeriodicalId\":9286,\"journal\":{\"name\":\"BMJ Open Ophthalmology\",\"volume\":\"9 1\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2024-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11487885/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMJ Open Ophthalmology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1136/bmjophth-2024-001824\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Open Ophthalmology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjophth-2024-001824","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的对由 Chat Generative Pre-trained Transformer (ChatGPT-4) 和 Google Bard 生成的白内障手术患者教育材料进行正面对比分析。方法与分析:2023 年 11 月,从 5 个值得信赖的在线患者信息资源中提取了 98 个有关白内障手术的英语常见问题。对其中的 59 个问题进行了整理(为清晰起见增加了 20 个问题,剔除了 39 个重复问题),并将其分为 3 个领域:病情(n=15)、手术准备(n=21)和术后恢复(n=23)。这些内容被编制成带有 "提示工程 "的输入提示。四位眼科医生使用患者教育材料评估工具--可打印(PEMAT-P)自动评分表对 ChatGPT-4 和 Google Bard 的回复进行了独立评分。使用 Flesch-Kincaid 计算器对回复的可读性进行了评估。此外,还主观检查了回复中是否存在不准确或有害信息:结果:Google Bard 的 Flesch-Kincaid 总平均水平(8.02)高于 ChatGPT-4(5.75)(P0.10)。生成的材料中没有一个包含危险信息:结论:与谷歌巴德相比,ChatGPT-4 的总体表现更好,在 PEMAT-P 可理解度量表中得分更高,而且更忠实于工程提示指令。由于输入提示可能与现实世界中患者的搜索有所不同,因此需要在患者参与的情况下进行后续研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Artificial intelligence chatbots as sources of patient education material for cataract surgery: ChatGPT-4 versus Google Bard.

Objective: To conduct a head-to-head comparative analysis of cataract surgery patient education material generated by Chat Generative Pre-trained Transformer (ChatGPT-4) and Google Bard.

Methods and analysis: 98 frequently asked questions on cataract surgery in English were taken in November 2023 from 5 trustworthy online patient information resources. 59 of these were curated (20 augmented for clarity and 39 duplicates excluded) and categorised into 3 domains: condition (n=15), preparation for surgery (n=21) and recovery after surgery (n=23). They were formulated into input prompts with 'prompt engineering'. Using the Patient Education Materials Assessment Tool-Printable (PEMAT-P) Auto-Scoring Form, four ophthalmologists independently graded ChatGPT-4 and Google Bard responses. The readability of responses was evaluated using a Flesch-Kincaid calculator. Responses were also subjectively examined for any inaccurate or harmful information.

Results: Google Bard had a higher mean overall Flesch-Kincaid Level (8.02) compared with ChatGPT-4 (5.75) (p<0.001), also noted across all three domains. ChatGPT-4 had a higher overall PEMAT-P understandability score (85.8%) in comparison to Google Bard (80.9%) (p<0.001), which was also noted in the 'preparation for cataract surgery' (85.2% vs 75.7%; p<0.001) and 'recovery after cataract surgery' (86.5% vs 82.3%; p=0.004) domains. There was no statistically significant difference in overall (42.5% vs 44.2%; p=0.344) or individual domain actionability scores (p>0.10). None of the generated material contained dangerous information.

Conclusion: In comparison to Google Bard, ChatGPT-4 fared better overall, scoring higher on the PEMAT-P understandability scale and exhibiting more faithfulness to the prompt engineering instruction. Since input prompts might vary from real-world patient searches, follow-up studies with patient participation are required.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
BMJ Open Ophthalmology
BMJ Open Ophthalmology OPHTHALMOLOGY-
CiteScore
3.40
自引率
4.20%
发文量
104
审稿时长
20 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信