生成式人工智能聊天机器人对硬膜外分娩常见问题的可读性、质量和准确性：ChatGPT和Bard的比较

IF 2.6 3区医学 Q2 ANESTHESIOLOGY

International journal of obstetric anesthesia Pub Date : 2025-02-01 DOI:10.1016/j.ijoa.2024.104317

D. Lee, M. Brown, J. Hammond, M. Zakowski

{"title":"生成式人工智能聊天机器人对硬膜外分娩常见问题的可读性、质量和准确性：ChatGPT和Bard的比较","authors":"D. Lee, M. Brown, J. Hammond, M. Zakowski","doi":"10.1016/j.ijoa.2024.104317","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Over 90% of pregnant women and 76% expectant fathers search for pregnancy health information. We examined readability, accuracy and quality of answers to common obstetric anesthesia questions from the popular generative artificial intelligence (AI) chatbots ChatGPT and Bard.</div></div><div><h3>Methods</h3><div>Twenty questions for generative AI chatbots were derived from frequently asked questions based on professional society, hospital and consumer websites. ChatGPT and Bard were queried in November 2023. Answers were graded for accuracy by four obstetric anesthesiologists. Quality was measured using Patient Education Materials Assessment Tool for Print (PEMAT). Readability was measured using six readability indices. Accuracy, quality and readability were compared using independent <em>t</em>-test.</div></div><div><h3>Results</h3><div>Bard readability scores were high school level, significantly easier than ChatGPT’s college level by all scoring metrics (<em>P</em> <0.001). Bard had significantly longer answers (<em>P</em> <0.001), yet with similar accuracy of Bard (85 % ± 10) and ChatGPT (87 % ± 14) (<em>P</em> = 0.5). PEMAT understandability scores were no statistically significantly different (<em>P</em> = 0.06). Actionability by PEMAT scores for Bard was significantly higher (22% vs. 9%) than ChatGPT (<em>P</em> = 0.007)</div></div><div><h3>Conclusion</h3><div>Answers to questions about “labor epidurals” should be accurate, high quality, and easy to read. Bard at high school reading level, was well above the goal 4<sup>th</sup> to 6<sup>th</sup> grade level suggested for patient materials. Consumers, health care providers, hospitals and governmental agencies should be aware of the quality of information generated by chatbots. Chatbots should meet the standards for readability and understandability of health-related questions, to aid public understanding and enhance shared decision-making.</div></div>","PeriodicalId":14250,"journal":{"name":"International journal of obstetric anesthesia","volume":"61 ","pages":"Article 104317"},"PeriodicalIF":2.6000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard\",\"authors\":\"D. Lee, M. Brown, J. Hammond, M. Zakowski\",\"doi\":\"10.1016/j.ijoa.2024.104317\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Introduction</h3><div>Over 90% of pregnant women and 76% expectant fathers search for pregnancy health information. We examined readability, accuracy and quality of answers to common obstetric anesthesia questions from the popular generative artificial intelligence (AI) chatbots ChatGPT and Bard.</div></div><div><h3>Methods</h3><div>Twenty questions for generative AI chatbots were derived from frequently asked questions based on professional society, hospital and consumer websites. ChatGPT and Bard were queried in November 2023. Answers were graded for accuracy by four obstetric anesthesiologists. Quality was measured using Patient Education Materials Assessment Tool for Print (PEMAT). Readability was measured using six readability indices. Accuracy, quality and readability were compared using independent <em>t</em>-test.</div></div><div><h3>Results</h3><div>Bard readability scores were high school level, significantly easier than ChatGPT’s college level by all scoring metrics (<em>P</em> <0.001). Bard had significantly longer answers (<em>P</em> <0.001), yet with similar accuracy of Bard (85 % ± 10) and ChatGPT (87 % ± 14) (<em>P</em> = 0.5). PEMAT understandability scores were no statistically significantly different (<em>P</em> = 0.06). Actionability by PEMAT scores for Bard was significantly higher (22% vs. 9%) than ChatGPT (<em>P</em> = 0.007)</div></div><div><h3>Conclusion</h3><div>Answers to questions about “labor epidurals” should be accurate, high quality, and easy to read. Bard at high school reading level, was well above the goal 4<sup>th</sup> to 6<sup>th</sup> grade level suggested for patient materials. Consumers, health care providers, hospitals and governmental agencies should be aware of the quality of information generated by chatbots. Chatbots should meet the standards for readability and understandability of health-related questions, to aid public understanding and enhance shared decision-making.</div></div>\",\"PeriodicalId\":14250,\"journal\":{\"name\":\"International journal of obstetric anesthesia\",\"volume\":\"61 \",\"pages\":\"Article 104317\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of obstetric anesthesia\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0959289X24003297\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ANESTHESIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of obstetric anesthesia","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0959289X24003297","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ANESTHESIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

导读：超过90%的孕妇和76%的准爸爸会搜索孕期健康信息。我们检查了流行的生成式人工智能（AI）聊天机器人ChatGPT和Bard对常见产科麻醉问题的回答的可读性、准确性和质量。方法：从专业学会、医院和消费者网站的常见问题中抽取20个生成式AI聊天机器人的问题。ChatGPT和Bard于2023年11月进行了查询。答案的准确性由四位产科麻醉师评分。使用患者教育材料评估工具（PEMAT）测量质量。使用6个可读性指标来测量可读性。准确性、质量和可读性采用独立t检验比较。结果：巴德的可读性分数是高中水平，明显比ChatGPT的大学水平更容易，所有评分指标（P至6年级建议的患者材料水平）。消费者、医疗保健提供者、医院和政府机构应该意识到聊天机器人产生的信息的质量。聊天机器人应符合健康相关问题的可读性和可理解性标准，以帮助公众理解和加强共同决策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard

Introduction

Over 90% of pregnant women and 76% expectant fathers search for pregnancy health information. We examined readability, accuracy and quality of answers to common obstetric anesthesia questions from the popular generative artificial intelligence (AI) chatbots ChatGPT and Bard.

Methods

Twenty questions for generative AI chatbots were derived from frequently asked questions based on professional society, hospital and consumer websites. ChatGPT and Bard were queried in November 2023. Answers were graded for accuracy by four obstetric anesthesiologists. Quality was measured using Patient Education Materials Assessment Tool for Print (PEMAT). Readability was measured using six readability indices. Accuracy, quality and readability were compared using independent t-test.

Results

Bard readability scores were high school level, significantly easier than ChatGPT’s college level by all scoring metrics (P <0.001). Bard had significantly longer answers (P <0.001), yet with similar accuracy of Bard (85 % ± 10) and ChatGPT (87 % ± 14) (P = 0.5). PEMAT understandability scores were no statistically significantly different (P = 0.06). Actionability by PEMAT scores for Bard was significantly higher (22% vs. 9%) than ChatGPT (P = 0.007)

Conclusion

Answers to questions about “labor epidurals” should be accurate, high quality, and easy to read. Bard at high school reading level, was well above the goal 4^th to 6^th grade level suggested for patient materials. Consumers, health care providers, hospitals and governmental agencies should be aware of the quality of information generated by chatbots. Chatbots should meet the standards for readability and understandability of health-related questions, to aid public understanding and enhance shared decision-making.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International journal of obstetric anesthesia 医学-妇产科学

CiteScore

4.70

自引率

7.10%

发文量

285

审稿时长

58 days

期刊介绍： The International Journal of Obstetric Anesthesia is the only journal publishing original articles devoted exclusively to obstetric anesthesia and bringing together all three of its principal components; anesthesia care for operative delivery and the perioperative period, pain relief in labour and care of the critically ill obstetric patient. • Original research (both clinical and laboratory), short reports and case reports will be considered. • The journal also publishes invited review articles and debates on topical and controversial subjects in the area of obstetric anesthesia. • Articles on related topics such as perinatal physiology and pharmacology and all subjects of importance to obstetric anaesthetists/anesthesiologists are also welcome. The journal is peer-reviewed by international experts. Scholarship is stressed to include the focus on discovery, application of knowledge across fields, and informing the medical community. Through the peer-review process, we hope to attest to the quality of scholarships and guide the Journal to extend and transform knowledge in this important and expanding area.