生成式人工智能聊天机器人对硬膜外分娩常见问题的可读性、质量和准确性:ChatGPT和Bard的比较

IF 2.6 3区 医学 Q2 ANESTHESIOLOGY
D. Lee, M. Brown, J. Hammond, M. Zakowski
{"title":"生成式人工智能聊天机器人对硬膜外分娩常见问题的可读性、质量和准确性:ChatGPT和Bard的比较","authors":"D. Lee,&nbsp;M. Brown,&nbsp;J. Hammond,&nbsp;M. Zakowski","doi":"10.1016/j.ijoa.2024.104317","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Over 90% of pregnant women and 76% expectant fathers search for pregnancy health information. We examined readability, accuracy and quality of answers to common obstetric anesthesia questions from the popular generative artificial intelligence (AI) chatbots ChatGPT and Bard.</div></div><div><h3>Methods</h3><div>Twenty questions for generative AI chatbots were derived from frequently asked questions based on professional society, hospital and consumer websites. ChatGPT and Bard were queried in November 2023. Answers were graded for accuracy by four obstetric anesthesiologists. Quality was measured using Patient Education Materials Assessment Tool for Print (PEMAT). Readability was measured using six readability indices. Accuracy, quality and readability were compared using independent <em>t</em>-test.</div></div><div><h3>Results</h3><div>Bard readability scores were high school level, significantly easier than ChatGPT’s college level by all scoring metrics (<em>P</em> &lt;0.001). Bard had significantly longer answers (<em>P</em> &lt;0.001), yet with similar accuracy of Bard (85 % ± 10) and ChatGPT (87 % ± 14) (<em>P</em> = 0.5). PEMAT understandability scores were no statistically significantly different (<em>P</em> = 0.06). Actionability by PEMAT scores for Bard was significantly higher (22% vs. 9%) than ChatGPT (<em>P</em> = 0.007)</div></div><div><h3>Conclusion</h3><div>Answers to questions about “labor epidurals” should be accurate, high quality, and easy to read. Bard at high school reading level, was well above the goal 4<sup>th</sup> to 6<sup>th</sup> grade level suggested for patient materials. Consumers, health care providers, hospitals and governmental agencies should be aware of the quality of information generated by chatbots. Chatbots should meet the standards for readability and understandability of health-related questions, to aid public understanding and enhance shared decision-making.</div></div>","PeriodicalId":14250,"journal":{"name":"International journal of obstetric anesthesia","volume":"61 ","pages":"Article 104317"},"PeriodicalIF":2.6000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard\",\"authors\":\"D. Lee,&nbsp;M. Brown,&nbsp;J. Hammond,&nbsp;M. Zakowski\",\"doi\":\"10.1016/j.ijoa.2024.104317\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Introduction</h3><div>Over 90% of pregnant women and 76% expectant fathers search for pregnancy health information. We examined readability, accuracy and quality of answers to common obstetric anesthesia questions from the popular generative artificial intelligence (AI) chatbots ChatGPT and Bard.</div></div><div><h3>Methods</h3><div>Twenty questions for generative AI chatbots were derived from frequently asked questions based on professional society, hospital and consumer websites. ChatGPT and Bard were queried in November 2023. Answers were graded for accuracy by four obstetric anesthesiologists. Quality was measured using Patient Education Materials Assessment Tool for Print (PEMAT). Readability was measured using six readability indices. Accuracy, quality and readability were compared using independent <em>t</em>-test.</div></div><div><h3>Results</h3><div>Bard readability scores were high school level, significantly easier than ChatGPT’s college level by all scoring metrics (<em>P</em> &lt;0.001). Bard had significantly longer answers (<em>P</em> &lt;0.001), yet with similar accuracy of Bard (85 % ± 10) and ChatGPT (87 % ± 14) (<em>P</em> = 0.5). PEMAT understandability scores were no statistically significantly different (<em>P</em> = 0.06). Actionability by PEMAT scores for Bard was significantly higher (22% vs. 9%) than ChatGPT (<em>P</em> = 0.007)</div></div><div><h3>Conclusion</h3><div>Answers to questions about “labor epidurals” should be accurate, high quality, and easy to read. Bard at high school reading level, was well above the goal 4<sup>th</sup> to 6<sup>th</sup> grade level suggested for patient materials. Consumers, health care providers, hospitals and governmental agencies should be aware of the quality of information generated by chatbots. Chatbots should meet the standards for readability and understandability of health-related questions, to aid public understanding and enhance shared decision-making.</div></div>\",\"PeriodicalId\":14250,\"journal\":{\"name\":\"International journal of obstetric anesthesia\",\"volume\":\"61 \",\"pages\":\"Article 104317\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of obstetric anesthesia\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0959289X24003297\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ANESTHESIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of obstetric anesthesia","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0959289X24003297","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ANESTHESIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

导读:超过90%的孕妇和76%的准爸爸会搜索孕期健康信息。我们检查了流行的生成式人工智能(AI)聊天机器人ChatGPT和Bard对常见产科麻醉问题的回答的可读性、准确性和质量。方法:从专业学会、医院和消费者网站的常见问题中抽取20个生成式AI聊天机器人的问题。ChatGPT和Bard于2023年11月进行了查询。答案的准确性由四位产科麻醉师评分。使用患者教育材料评估工具(PEMAT)测量质量。使用6个可读性指标来测量可读性。准确性、质量和可读性采用独立t检验比较。结果:巴德的可读性分数是高中水平,明显比ChatGPT的大学水平更容易,所有评分指标(P至6年级建议的患者材料水平)。消费者、医疗保健提供者、医院和政府机构应该意识到聊天机器人产生的信息的质量。聊天机器人应符合健康相关问题的可读性和可理解性标准,以帮助公众理解和加强共同决策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard

Introduction

Over 90% of pregnant women and 76% expectant fathers search for pregnancy health information. We examined readability, accuracy and quality of answers to common obstetric anesthesia questions from the popular generative artificial intelligence (AI) chatbots ChatGPT and Bard.

Methods

Twenty questions for generative AI chatbots were derived from frequently asked questions based on professional society, hospital and consumer websites. ChatGPT and Bard were queried in November 2023. Answers were graded for accuracy by four obstetric anesthesiologists. Quality was measured using Patient Education Materials Assessment Tool for Print (PEMAT). Readability was measured using six readability indices. Accuracy, quality and readability were compared using independent t-test.

Results

Bard readability scores were high school level, significantly easier than ChatGPT’s college level by all scoring metrics (P <0.001). Bard had significantly longer answers (P <0.001), yet with similar accuracy of Bard (85 % ± 10) and ChatGPT (87 % ± 14) (P = 0.5). PEMAT understandability scores were no statistically significantly different (P = 0.06). Actionability by PEMAT scores for Bard was significantly higher (22% vs. 9%) than ChatGPT (P = 0.007)

Conclusion

Answers to questions about “labor epidurals” should be accurate, high quality, and easy to read. Bard at high school reading level, was well above the goal 4th to 6th grade level suggested for patient materials. Consumers, health care providers, hospitals and governmental agencies should be aware of the quality of information generated by chatbots. Chatbots should meet the standards for readability and understandability of health-related questions, to aid public understanding and enhance shared decision-making.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
4.70
自引率
7.10%
发文量
285
审稿时长
58 days
期刊介绍: The International Journal of Obstetric Anesthesia is the only journal publishing original articles devoted exclusively to obstetric anesthesia and bringing together all three of its principal components; anesthesia care for operative delivery and the perioperative period, pain relief in labour and care of the critically ill obstetric patient. • Original research (both clinical and laboratory), short reports and case reports will be considered. • The journal also publishes invited review articles and debates on topical and controversial subjects in the area of obstetric anesthesia. • Articles on related topics such as perinatal physiology and pharmacology and all subjects of importance to obstetric anaesthetists/anesthesiologists are also welcome. The journal is peer-reviewed by international experts. Scholarship is stressed to include the focus on discovery, application of knowledge across fields, and informing the medical community. Through the peer-review process, we hope to attest to the quality of scholarships and guide the Journal to extend and transform knowledge in this important and expanding area.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信