{"title":"生成式人工智能聊天机器人对硬膜外分娩常见问题的可读性、质量和准确性:ChatGPT和Bard的比较","authors":"D. Lee, M. Brown, J. Hammond, M. Zakowski","doi":"10.1016/j.ijoa.2024.104317","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Over 90% of pregnant women and 76% expectant fathers search for pregnancy health information. We examined readability, accuracy and quality of answers to common obstetric anesthesia questions from the popular generative artificial intelligence (AI) chatbots ChatGPT and Bard.</div></div><div><h3>Methods</h3><div>Twenty questions for generative AI chatbots were derived from frequently asked questions based on professional society, hospital and consumer websites. ChatGPT and Bard were queried in November 2023. Answers were graded for accuracy by four obstetric anesthesiologists. Quality was measured using Patient Education Materials Assessment Tool for Print (PEMAT). Readability was measured using six readability indices. Accuracy, quality and readability were compared using independent <em>t</em>-test.</div></div><div><h3>Results</h3><div>Bard readability scores were high school level, significantly easier than ChatGPT’s college level by all scoring metrics (<em>P</em> <0.001). Bard had significantly longer answers (<em>P</em> <0.001), yet with similar accuracy of Bard (85 % ± 10) and ChatGPT (87 % ± 14) (<em>P</em> = 0.5). PEMAT understandability scores were no statistically significantly different (<em>P</em> = 0.06). Actionability by PEMAT scores for Bard was significantly higher (22% vs. 9%) than ChatGPT (<em>P</em> = 0.007)</div></div><div><h3>Conclusion</h3><div>Answers to questions about “labor epidurals” should be accurate, high quality, and easy to read. Bard at high school reading level, was well above the goal 4<sup>th</sup> to 6<sup>th</sup> grade level suggested for patient materials. Consumers, health care providers, hospitals and governmental agencies should be aware of the quality of information generated by chatbots. Chatbots should meet the standards for readability and understandability of health-related questions, to aid public understanding and enhance shared decision-making.</div></div>","PeriodicalId":14250,"journal":{"name":"International journal of obstetric anesthesia","volume":"61 ","pages":"Article 104317"},"PeriodicalIF":2.6000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard\",\"authors\":\"D. Lee, M. Brown, J. Hammond, M. Zakowski\",\"doi\":\"10.1016/j.ijoa.2024.104317\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Introduction</h3><div>Over 90% of pregnant women and 76% expectant fathers search for pregnancy health information. We examined readability, accuracy and quality of answers to common obstetric anesthesia questions from the popular generative artificial intelligence (AI) chatbots ChatGPT and Bard.</div></div><div><h3>Methods</h3><div>Twenty questions for generative AI chatbots were derived from frequently asked questions based on professional society, hospital and consumer websites. ChatGPT and Bard were queried in November 2023. Answers were graded for accuracy by four obstetric anesthesiologists. Quality was measured using Patient Education Materials Assessment Tool for Print (PEMAT). Readability was measured using six readability indices. Accuracy, quality and readability were compared using independent <em>t</em>-test.</div></div><div><h3>Results</h3><div>Bard readability scores were high school level, significantly easier than ChatGPT’s college level by all scoring metrics (<em>P</em> <0.001). Bard had significantly longer answers (<em>P</em> <0.001), yet with similar accuracy of Bard (85 % ± 10) and ChatGPT (87 % ± 14) (<em>P</em> = 0.5). PEMAT understandability scores were no statistically significantly different (<em>P</em> = 0.06). Actionability by PEMAT scores for Bard was significantly higher (22% vs. 9%) than ChatGPT (<em>P</em> = 0.007)</div></div><div><h3>Conclusion</h3><div>Answers to questions about “labor epidurals” should be accurate, high quality, and easy to read. Bard at high school reading level, was well above the goal 4<sup>th</sup> to 6<sup>th</sup> grade level suggested for patient materials. Consumers, health care providers, hospitals and governmental agencies should be aware of the quality of information generated by chatbots. Chatbots should meet the standards for readability and understandability of health-related questions, to aid public understanding and enhance shared decision-making.</div></div>\",\"PeriodicalId\":14250,\"journal\":{\"name\":\"International journal of obstetric anesthesia\",\"volume\":\"61 \",\"pages\":\"Article 104317\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of obstetric anesthesia\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0959289X24003297\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ANESTHESIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of obstetric anesthesia","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0959289X24003297","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ANESTHESIOLOGY","Score":null,"Total":0}
Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard
Introduction
Over 90% of pregnant women and 76% expectant fathers search for pregnancy health information. We examined readability, accuracy and quality of answers to common obstetric anesthesia questions from the popular generative artificial intelligence (AI) chatbots ChatGPT and Bard.
Methods
Twenty questions for generative AI chatbots were derived from frequently asked questions based on professional society, hospital and consumer websites. ChatGPT and Bard were queried in November 2023. Answers were graded for accuracy by four obstetric anesthesiologists. Quality was measured using Patient Education Materials Assessment Tool for Print (PEMAT). Readability was measured using six readability indices. Accuracy, quality and readability were compared using independent t-test.
Results
Bard readability scores were high school level, significantly easier than ChatGPT’s college level by all scoring metrics (P <0.001). Bard had significantly longer answers (P <0.001), yet with similar accuracy of Bard (85 % ± 10) and ChatGPT (87 % ± 14) (P = 0.5). PEMAT understandability scores were no statistically significantly different (P = 0.06). Actionability by PEMAT scores for Bard was significantly higher (22% vs. 9%) than ChatGPT (P = 0.007)
Conclusion
Answers to questions about “labor epidurals” should be accurate, high quality, and easy to read. Bard at high school reading level, was well above the goal 4th to 6th grade level suggested for patient materials. Consumers, health care providers, hospitals and governmental agencies should be aware of the quality of information generated by chatbots. Chatbots should meet the standards for readability and understandability of health-related questions, to aid public understanding and enhance shared decision-making.
期刊介绍:
The International Journal of Obstetric Anesthesia is the only journal publishing original articles devoted exclusively to obstetric anesthesia and bringing together all three of its principal components; anesthesia care for operative delivery and the perioperative period, pain relief in labour and care of the critically ill obstetric patient.
• Original research (both clinical and laboratory), short reports and case reports will be considered.
• The journal also publishes invited review articles and debates on topical and controversial subjects in the area of obstetric anesthesia.
• Articles on related topics such as perinatal physiology and pharmacology and all subjects of importance to obstetric anaesthetists/anesthesiologists are also welcome.
The journal is peer-reviewed by international experts. Scholarship is stressed to include the focus on discovery, application of knowledge across fields, and informing the medical community. Through the peer-review process, we hope to attest to the quality of scholarships and guide the Journal to extend and transform knowledge in this important and expanding area.