ChatGPT: An Evaluation of AI-Generated Responses to Commonly Asked Pregnancy Questions

Open Journal of Obstetrics and Gynecology Pub Date : 2023-01-01 DOI:10.4236/ojog.2023.139129

Christopher Wan, Angelo Cadiente, Keren Khromchenko, Natalie Friedricks, Rima A. Rana, Jonathan D. Baum

{"title":"ChatGPT: An Evaluation of AI-Generated Responses to Commonly Asked Pregnancy Questions","authors":"Christopher Wan, Angelo Cadiente, Keren Khromchenko, Natalie Friedricks, Rima A. Rana, Jonathan D. Baum","doi":"10.4236/ojog.2023.139129","DOIUrl":null,"url":null,"abstract":"Background: A recent assessment of ChatGPT on a variety of obstetric and gynecologic topics was very encouraging. However, its ability to respond to commonly asked pregnancy questions is unknown. Reference verification needs to be examined as well. Purpose: To evaluate ChatGPT as a source of information for commonly asked pregnancy questions and to verify the references it provides. Methods: Qualitative analysis of ChatGPT was performed. We queried ChatGPT Version 3.5 on 12 commonly asked pregnancy questions and asked for its references. Query responses were graded as “acceptable” or “not acceptable” based on correctness and completeness in comparison to American College of Obstetricians and Gynecologists (ACOG) publications, PubMed-indexed evidence, and clinical experience. References were classified as “verified”, “broken”, “irrelevant”, “non-existent” or “no references”. Review and grading of responses and references were performed by the co-authors individually and then as a group to formulate a consensus. Results: In our assessment, a grade of acceptable was given to 50% of responses (6 out of 12 questions). A grade of not acceptable was assigned to the remaining 50% of responses (5 were incomplete and 1 was incorrect). In regard to references, 58% (7 out of 12) had deficiencies (5 had no references, 1 had a broken reference, and 1 non-existent reference was provided). Conclusion: Our evaluation of ChatGPT confirms prior concerns regarding both content and references. While AI has enormous potential, it must be carefully evaluated before being accepted as accurate and reliable for this purpose.","PeriodicalId":19676,"journal":{"name":"Open Journal of Obstetrics and Gynecology","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Open Journal of Obstetrics and Gynecology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4236/ojog.2023.139129","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: A recent assessment of ChatGPT on a variety of obstetric and gynecologic topics was very encouraging. However, its ability to respond to commonly asked pregnancy questions is unknown. Reference verification needs to be examined as well. Purpose: To evaluate ChatGPT as a source of information for commonly asked pregnancy questions and to verify the references it provides. Methods: Qualitative analysis of ChatGPT was performed. We queried ChatGPT Version 3.5 on 12 commonly asked pregnancy questions and asked for its references. Query responses were graded as “acceptable” or “not acceptable” based on correctness and completeness in comparison to American College of Obstetricians and Gynecologists (ACOG) publications, PubMed-indexed evidence, and clinical experience. References were classified as “verified”, “broken”, “irrelevant”, “non-existent” or “no references”. Review and grading of responses and references were performed by the co-authors individually and then as a group to formulate a consensus. Results: In our assessment, a grade of acceptable was given to 50% of responses (6 out of 12 questions). A grade of not acceptable was assigned to the remaining 50% of responses (5 were incomplete and 1 was incorrect). In regard to references, 58% (7 out of 12) had deficiencies (5 had no references, 1 had a broken reference, and 1 non-existent reference was provided). Conclusion: Our evaluation of ChatGPT confirms prior concerns regarding both content and references. While AI has enormous potential, it must be carefully evaluated before being accepted as accurate and reliable for this purpose.

查看原文本刊更多论文

ChatGPT:对人工智能生成的常见妊娠问题回答的评估

背景:最近对ChatGPT在各种产科和妇科主题上的评估非常令人鼓舞。然而，它对常见怀孕问题的反应能力尚不清楚。参考资料核查也需要审查。目的:评价ChatGPT作为妊娠常见问题的信息来源，并验证其提供的参考文献。方法:对ChatGPT进行定性分析。我们在ChatGPT 3.5版本中查询了12个常见的怀孕问题，并询问了参考资料。根据与美国妇产科学院(ACOG)出版物、pubmed索引证据和临床经验的比较，查询回答的正确性和完整性被分为“可接受”或“不可接受”。参考文献被分类为“已证实”、“不完整”、“不相关”、“不存在”或“没有参考文献”。对回复和参考文献的审查和评分由共同作者单独进行，然后作为一个小组形成共识。结果:在我们的评估中，50%的回答(12个问题中的6个)得到了可接受的等级。其余50%的回答(5个不完整，1个不正确)被划分为不可接受等级。关于参考文献，58%(12人中有7人)有缺陷(5人没有参考文献，1人的参考文献不完整，1人的参考文献不存在)。结论:我们对ChatGPT的评估证实了之前对内容和参考文献的关注。虽然人工智能具有巨大的潜力，但在被认为准确可靠地用于这一目的之前，必须对其进行仔细评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Open Journal of Obstetrics and Gynecology

自引率

0.00%

发文量