P.L.M. de Vries , D. Baud , S. Baggio , M. Ceulemans , G. Favre , E. Gerbier , H. Legardeur , E. Maisonneuve , C. Pena-Reyes , L. Pomar , U. Winterfeld , A. Panchaud
{"title":"Enhancing perinatal health patient information through ChatGPT – An accuracy study","authors":"P.L.M. de Vries , D. Baud , S. Baggio , M. Ceulemans , G. Favre , E. Gerbier , H. Legardeur , E. Maisonneuve , C. Pena-Reyes , L. Pomar , U. Winterfeld , A. Panchaud","doi":"10.1016/j.pecinn.2025.100381","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>To evaluate ChatGPT's accuracy as information source for women and maternity-care workers on “nutrition” and “red flags” in pregnancy.</div></div><div><h3>Methods</h3><div>Accuracy of ChatGPT-generated recommendations was assessed by a 5-point Likert scale by eight raters for ten indicators per topic in four languages (French, English, German and Dutch). Accuracy and interrater agreement were calculated per topic and language.</div></div><div><h3>Results</h3><div>For both topics, median accuracy scores of ChatGPT-generated recommendations were excellent (5.0; IQR 4–5) independently of language. Median accuracy scores varied with a maximum of 1 on a 5-point Likert-scare according to question's framing. Overall accuracy scores were 83–89 % for ‘nutrition in pregnancy’ versus 96–98 % for ‘red flags in pregnancy’. Inter-rater agreement was good to excellent for both topics.</div></div><div><h3>Conclusion</h3><div>Although ChatGPT generated accurate recommendations regarding the tested indicators for nutrition and red flags during pregnancy, women should be aware of ChatGPT's limitations such as inconsistencies according to formulation, language and the woman's personal context.</div></div><div><h3>Innovation</h3><div>Despite a growing interest in the potential use of artificial intelligence in healthcare, this is, to the best of our knowledge, the first study assessing potential limitations that may impact accuracy of ChatGPT-generated recommendations such as language and question-framing in key domains of perinatal health.</div></div>","PeriodicalId":74407,"journal":{"name":"PEC innovation","volume":"6 ","pages":"Article 100381"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PEC innovation","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S277262822500010X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives
To evaluate ChatGPT's accuracy as information source for women and maternity-care workers on “nutrition” and “red flags” in pregnancy.
Methods
Accuracy of ChatGPT-generated recommendations was assessed by a 5-point Likert scale by eight raters for ten indicators per topic in four languages (French, English, German and Dutch). Accuracy and interrater agreement were calculated per topic and language.
Results
For both topics, median accuracy scores of ChatGPT-generated recommendations were excellent (5.0; IQR 4–5) independently of language. Median accuracy scores varied with a maximum of 1 on a 5-point Likert-scare according to question's framing. Overall accuracy scores were 83–89 % for ‘nutrition in pregnancy’ versus 96–98 % for ‘red flags in pregnancy’. Inter-rater agreement was good to excellent for both topics.
Conclusion
Although ChatGPT generated accurate recommendations regarding the tested indicators for nutrition and red flags during pregnancy, women should be aware of ChatGPT's limitations such as inconsistencies according to formulation, language and the woman's personal context.
Innovation
Despite a growing interest in the potential use of artificial intelligence in healthcare, this is, to the best of our knowledge, the first study assessing potential limitations that may impact accuracy of ChatGPT-generated recommendations such as language and question-framing in key domains of perinatal health.