Harry Collin MD, Chelsea Tong MBChb, Abhishekh Srinivas MD, Angus Pegler MD, Philip Allan MB ChB, Daniel Hagley MBBS FRACS
{"title":"Evaluating the role of AI chatbots in patient education for abdominal aortic aneurysms: a comparison of ChatGPT and conventional resources","authors":"Harry Collin MD, Chelsea Tong MBChb, Abhishekh Srinivas MD, Angus Pegler MD, Philip Allan MB ChB, Daniel Hagley MBBS FRACS","doi":"10.1111/ans.70053","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Backgrounds</h3>\n \n <p>Abdominal aortic aneurysms (AAA) carry significant risks, yet patient understanding is often limited, with online resources typically low quality. ChatGPT, an artificial intelligence (AI) chatbot, presents a new frontier in patient education, but concerns remain about misinformation. This study evaluates the quality of ChatGPT-generated patient information on AAA.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Eight patient questions on AAA were sourced from a reputable online resource for patient information funded by the Australian Government's Healthdirect Australia (HDA) website and input into ChatGPT's free (ChatGPT-4o mini) and paid (ChatGPT-4) models. A vascular surgeon evaluated response appropriateness. Readability was assessed using the Flesch–Kincaid test. The Patient Education Materials Assessment Tool (PEMAT) measured understandability and actionability, with responses scoring ≥75% for both considered high-quality.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>All responses were deemed clinically appropriate. Mean response length was longer for ChatGPT than HDA. Readability was at a college level for ChatGPT, while HDA was at a 10th to 12th-grade level. One response was high-quality (generated by paid ChatGPT) with a PEMAT actionability score of ≥75%. Actionability scores were otherwise low across all sources with ChatGPT responses more likely to contain identifiable actions, although these were often not clearly presented. ChatGPT responses were marginally more understandable than HDA.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>ChatGPT-generated information on AAA was appropriate and understandable, outperforming HDA in both aspects. However, AI responses are at a more advanced reading level and lack actionable instructions. AI chatbots show promise as supplemental tools for AAA patient education, but further refinement is needed to enhance their effectiveness in supporting informed decision-making.</p>\n </section>\n </div>","PeriodicalId":8158,"journal":{"name":"ANZ Journal of Surgery","volume":"95 4","pages":"784-788"},"PeriodicalIF":1.5000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ans.70053","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ANZ Journal of Surgery","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/ans.70053","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
Backgrounds
Abdominal aortic aneurysms (AAA) carry significant risks, yet patient understanding is often limited, with online resources typically low quality. ChatGPT, an artificial intelligence (AI) chatbot, presents a new frontier in patient education, but concerns remain about misinformation. This study evaluates the quality of ChatGPT-generated patient information on AAA.
Methods
Eight patient questions on AAA were sourced from a reputable online resource for patient information funded by the Australian Government's Healthdirect Australia (HDA) website and input into ChatGPT's free (ChatGPT-4o mini) and paid (ChatGPT-4) models. A vascular surgeon evaluated response appropriateness. Readability was assessed using the Flesch–Kincaid test. The Patient Education Materials Assessment Tool (PEMAT) measured understandability and actionability, with responses scoring ≥75% for both considered high-quality.
Results
All responses were deemed clinically appropriate. Mean response length was longer for ChatGPT than HDA. Readability was at a college level for ChatGPT, while HDA was at a 10th to 12th-grade level. One response was high-quality (generated by paid ChatGPT) with a PEMAT actionability score of ≥75%. Actionability scores were otherwise low across all sources with ChatGPT responses more likely to contain identifiable actions, although these were often not clearly presented. ChatGPT responses were marginally more understandable than HDA.
Conclusions
ChatGPT-generated information on AAA was appropriate and understandable, outperforming HDA in both aspects. However, AI responses are at a more advanced reading level and lack actionable instructions. AI chatbots show promise as supplemental tools for AAA patient education, but further refinement is needed to enhance their effectiveness in supporting informed decision-making.
背景:腹主动脉瘤(AAA)具有显著的风险,但患者的理解往往有限,在线资源通常质量较低。人工智能(AI)聊天机器人ChatGPT在患者教育方面开辟了一个新领域,但人们仍然担心错误信息。本研究评估了ChatGPT在AAA上生成的患者信息的质量。方法:八个关于AAA的患者问题来自澳大利亚政府Healthdirect Australia (HDA)网站资助的一个声誉良好的在线患者信息资源,并输入ChatGPT的免费(ChatGPT- 40 mini)和付费(ChatGPT-4)模型。一位血管外科医生评估了反应的适当性。使用Flesch-Kincaid测试评估可读性。患者教育材料评估工具(PEMAT)测量了可理解性和可操作性,反应得分≥75%被认为是高质量的。结果:所有反应均被认为是临床适宜的。ChatGPT的平均反应时间长于HDA。ChatGPT的可读性达到了大学水平,而HDA则达到了10到12年级的水平。一个反应是高质量的(由付费ChatGPT生成),PEMAT可操作性评分≥75%。可操作性得分在所有来源中都很低,ChatGPT的响应更有可能包含可识别的操作,尽管这些操作通常没有被清楚地呈现出来。ChatGPT的回答比HDA更容易理解。结论:chatgpt生成的AAA信息是恰当且可理解的,在这两个方面都优于HDA。然而,人工智能的反应处于更高级的阅读水平,缺乏可操作的指令。人工智能聊天机器人有望成为AAA级患者教育的补充工具,但需要进一步改进,以提高其在支持知情决策方面的有效性。
期刊介绍:
ANZ Journal of Surgery is published by Wiley on behalf of the Royal Australasian College of Surgeons to provide a medium for the publication of peer-reviewed original contributions related to clinical practice and/or research in all fields of surgery and related disciplines. It also provides a programme of continuing education for surgeons. All articles are peer-reviewed by at least two researchers expert in the field of the submitted paper.