人工智能聊天机器人作为白内障手术患者教育材料的来源：ChatGPT-4 与 Google Bard 的对比。

IF 2 Q2 OPHTHALMOLOGY

BMJ Open Ophthalmology Pub Date : 2024-10-17 DOI:10.1136/bmjophth-2024-001824

Matthew Azzopardi, Benjamin Ng, Abison Logeswaran, Constantinos Loizou, Ryan Chin Taw Cheong, Prasanth Gireesh, Darren Shu Jeng Ting, Yu Jeat Chong

{"title":"人工智能聊天机器人作为白内障手术患者教育材料的来源：ChatGPT-4 与 Google Bard 的对比。","authors":"Matthew Azzopardi, Benjamin Ng, Abison Logeswaran, Constantinos Loizou, Ryan Chin Taw Cheong, Prasanth Gireesh, Darren Shu Jeng Ting, Yu Jeat Chong","doi":"10.1136/bmjophth-2024-001824","DOIUrl":null,"url":null,"abstract":"Objective: To conduct a head-to-head comparative analysis of cataract surgery patient education material generated by Chat Generative Pre-trained Transformer (ChatGPT-4) and Google Bard.Methods and analysis: 98 frequently asked questions on cataract surgery in English were taken in November 2023 from 5 trustworthy online patient information resources. 59 of these were curated (20 augmented for clarity and 39 duplicates excluded) and categorised into 3 domains: condition (n=15), preparation for surgery (n=21) and recovery after surgery (n=23). They were formulated into input prompts with 'prompt engineering'. Using the Patient Education Materials Assessment Tool-Printable (PEMAT-P) Auto-Scoring Form, four ophthalmologists independently graded ChatGPT-4 and Google Bard responses. The readability of responses was evaluated using a Flesch-Kincaid calculator. Responses were also subjectively examined for any inaccurate or harmful information.Results: Google Bard had a higher mean overall Flesch-Kincaid Level (8.02) compared with ChatGPT-4 (5.75) (p<0.001), also noted across all three domains. ChatGPT-4 had a higher overall PEMAT-P understandability score (85.8%) in comparison to Google Bard (80.9%) (p<0.001), which was also noted in the 'preparation for cataract surgery' (85.2% vs 75.7%; p<0.001) and 'recovery after cataract surgery' (86.5% vs 82.3%; p=0.004) domains. There was no statistically significant difference in overall (42.5% vs 44.2%; p=0.344) or individual domain actionability scores (p>0.10). None of the generated material contained dangerous information.Conclusion: In comparison to Google Bard, ChatGPT-4 fared better overall, scoring higher on the PEMAT-P understandability scale and exhibiting more faithfulness to the prompt engineering instruction. Since input prompts might vary from real-world patient searches, follow-up studies with patient participation are required.","PeriodicalId":9286,"journal":{"name":"BMJ Open Ophthalmology","volume":"9 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11487885/pdf/","citationCount":"0","resultStr":"{\"title\":\"Artificial intelligence chatbots as sources of patient education material for cataract surgery: ChatGPT-4 versus Google Bard.\",\"authors\":\"Matthew Azzopardi, Benjamin Ng, Abison Logeswaran, Constantinos Loizou, Ryan Chin Taw Cheong, Prasanth Gireesh, Darren Shu Jeng Ting, Yu Jeat Chong\",\"doi\":\"10.1136/bmjophth-2024-001824\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: To conduct a head-to-head comparative analysis of cataract surgery patient education material generated by Chat Generative Pre-trained Transformer (ChatGPT-4) and Google Bard.Methods and analysis: 98 frequently asked questions on cataract surgery in English were taken in November 2023 from 5 trustworthy online patient information resources. 59 of these were curated (20 augmented for clarity and 39 duplicates excluded) and categorised into 3 domains: condition (n=15), preparation for surgery (n=21) and recovery after surgery (n=23). They were formulated into input prompts with 'prompt engineering'. Using the Patient Education Materials Assessment Tool-Printable (PEMAT-P) Auto-Scoring Form, four ophthalmologists independently graded ChatGPT-4 and Google Bard responses. The readability of responses was evaluated using a Flesch-Kincaid calculator. Responses were also subjectively examined for any inaccurate or harmful information.Results: Google Bard had a higher mean overall Flesch-Kincaid Level (8.02) compared with ChatGPT-4 (5.75) (p<0.001), also noted across all three domains. ChatGPT-4 had a higher overall PEMAT-P understandability score (85.8%) in comparison to Google Bard (80.9%) (p<0.001), which was also noted in the 'preparation for cataract surgery' (85.2% vs 75.7%; p<0.001) and 'recovery after cataract surgery' (86.5% vs 82.3%; p=0.004) domains. There was no statistically significant difference in overall (42.5% vs 44.2%; p=0.344) or individual domain actionability scores (p>0.10). None of the generated material contained dangerous information.Conclusion: In comparison to Google Bard, ChatGPT-4 fared better overall, scoring higher on the PEMAT-P understandability scale and exhibiting more faithfulness to the prompt engineering instruction. Since input prompts might vary from real-world patient searches, follow-up studies with patient participation are required.\",\"PeriodicalId\":9286,\"journal\":{\"name\":\"BMJ Open Ophthalmology\",\"volume\":\"9 1\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2024-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11487885/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMJ Open Ophthalmology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1136/bmjophth-2024-001824\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Open Ophthalmology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjophth-2024-001824","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

目的对由 Chat Generative Pre-trained Transformer (ChatGPT-4) 和 Google Bard 生成的白内障手术患者教育材料进行正面对比分析。方法与分析：2023 年 11 月，从 5 个值得信赖的在线患者信息资源中提取了 98 个有关白内障手术的英语常见问题。对其中的 59 个问题进行了整理（为清晰起见增加了 20 个问题，剔除了 39 个重复问题），并将其分为 3 个领域：病情（n=15）、手术准备（n=21）和术后恢复（n=23）。这些内容被编制成带有 "提示工程 "的输入提示。四位眼科医生使用患者教育材料评估工具--可打印（PEMAT-P）自动评分表对 ChatGPT-4 和 Google Bard 的回复进行了独立评分。使用 Flesch-Kincaid 计算器对回复的可读性进行了评估。此外，还主观检查了回复中是否存在不准确或有害信息：结果：Google Bard 的 Flesch-Kincaid 总平均水平（8.02）高于 ChatGPT-4（5.75）（P0.10）。生成的材料中没有一个包含危险信息：结论：与谷歌巴德相比，ChatGPT-4 的总体表现更好，在 PEMAT-P 可理解度量表中得分更高，而且更忠实于工程提示指令。由于输入提示可能与现实世界中患者的搜索有所不同，因此需要在患者参与的情况下进行后续研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Artificial intelligence chatbots as sources of patient education material for cataract surgery: ChatGPT-4 versus Google Bard.

Objective: To conduct a head-to-head comparative analysis of cataract surgery patient education material generated by Chat Generative Pre-trained Transformer (ChatGPT-4) and Google Bard.

Methods and analysis: 98 frequently asked questions on cataract surgery in English were taken in November 2023 from 5 trustworthy online patient information resources. 59 of these were curated (20 augmented for clarity and 39 duplicates excluded) and categorised into 3 domains: condition (n=15), preparation for surgery (n=21) and recovery after surgery (n=23). They were formulated into input prompts with 'prompt engineering'. Using the Patient Education Materials Assessment Tool-Printable (PEMAT-P) Auto-Scoring Form, four ophthalmologists independently graded ChatGPT-4 and Google Bard responses. The readability of responses was evaluated using a Flesch-Kincaid calculator. Responses were also subjectively examined for any inaccurate or harmful information.

Results: Google Bard had a higher mean overall Flesch-Kincaid Level (8.02) compared with ChatGPT-4 (5.75) (p<0.001), also noted across all three domains. ChatGPT-4 had a higher overall PEMAT-P understandability score (85.8%) in comparison to Google Bard (80.9%) (p<0.001), which was also noted in the 'preparation for cataract surgery' (85.2% vs 75.7%; p<0.001) and 'recovery after cataract surgery' (86.5% vs 82.3%; p=0.004) domains. There was no statistically significant difference in overall (42.5% vs 44.2%; p=0.344) or individual domain actionability scores (p>0.10). None of the generated material contained dangerous information.

Conclusion: In comparison to Google Bard, ChatGPT-4 fared better overall, scoring higher on the PEMAT-P understandability scale and exhibiting more faithfulness to the prompt engineering instruction. Since input prompts might vary from real-world patient searches, follow-up studies with patient participation are required.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMJ Open Ophthalmology OPHTHALMOLOGY-

CiteScore

3.40

自引率

4.20%

发文量

104

审稿时长

20 weeks