Accuracy and Safety of ChatGPT-3.5 in Assessing Over-the-Counter Medication Use During Pregnancy: A Descriptive Comparative Study.

IF 1.8 Q3 PHARMACOLOGY & PHARMACY

Pharmacy Pub Date : 2025-07-30 DOI:10.3390/pharmacy13040104

Bernadette Cornelison, David R Axon, Bryan Abbott, Carter Bishop, Cindy Jebara, Anjali Kumar, Kristen A Root

{"title":"Accuracy and Safety of ChatGPT-3.5 in Assessing Over-the-Counter Medication Use During Pregnancy: A Descriptive Comparative Study.","authors":"Bernadette Cornelison, David R Axon, Bryan Abbott, Carter Bishop, Cindy Jebara, Anjali Kumar, Kristen A Root","doi":"10.3390/pharmacy13040104","DOIUrl":null,"url":null,"abstract":"As artificial intelligence (AI) becomes increasingly utilized to perform tasks requiring human intelligence, patients who are pregnant may turn to AI for advice on over-the-counter (OTC) medications. However, medications used in pregnancy may pose profound safety concerns limited by data availability. This study focuses on a chatbot's ability to accurately provide information regarding OTC medications as it relates to patients that are pregnant. A prospective, descriptive design was used to compare the responses generated by the Chat Generative Pre-Trained Transformer 3.5 (ChatGPT-3.5) to the information provided by UpToDate®. Eighty-seven of the top pharmacist-recommended OTC drugs in the United States (U.S.) as identified by Pharmacy Times were assessed for safe use in pregnancy using ChatGPT-3.5. A piloted, standard prompt was input into ChatGPT-3.5, and the responses were recorded. Two groups independently rated the responses compared to UpToDate on their correctness, completeness, and safety using a 5-point Likert scale. After independent evaluations, the groups discussed the findings to reach a consensus, with a third independent investigator giving final ratings. For correctness, the median score was 5 (interquartile range [IQR]: 5-5). For completeness, the median score was 4 (IQR: 4-5). For safety, the median score was 5 (IQR: 5-5). Despite high overall scores, the safety errors in 9% of the evaluations (n = 8), including omissions that pose a risk of serious complications, currently renders the chatbot an unsafe standalone resource for this purpose.","PeriodicalId":30544,"journal":{"name":"Pharmacy","volume":"13 4","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12389367/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pharmacy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/pharmacy13040104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}

引用次数: 0

Abstract

As artificial intelligence (AI) becomes increasingly utilized to perform tasks requiring human intelligence, patients who are pregnant may turn to AI for advice on over-the-counter (OTC) medications. However, medications used in pregnancy may pose profound safety concerns limited by data availability. This study focuses on a chatbot's ability to accurately provide information regarding OTC medications as it relates to patients that are pregnant. A prospective, descriptive design was used to compare the responses generated by the Chat Generative Pre-Trained Transformer 3.5 (ChatGPT-3.5) to the information provided by UpToDate^®. Eighty-seven of the top pharmacist-recommended OTC drugs in the United States (U.S.) as identified by Pharmacy Times were assessed for safe use in pregnancy using ChatGPT-3.5. A piloted, standard prompt was input into ChatGPT-3.5, and the responses were recorded. Two groups independently rated the responses compared to UpToDate on their correctness, completeness, and safety using a 5-point Likert scale. After independent evaluations, the groups discussed the findings to reach a consensus, with a third independent investigator giving final ratings. For correctness, the median score was 5 (interquartile range [IQR]: 5-5). For completeness, the median score was 4 (IQR: 4-5). For safety, the median score was 5 (IQR: 5-5). Despite high overall scores, the safety errors in 9% of the evaluations (n = 8), including omissions that pose a risk of serious complications, currently renders the chatbot an unsafe standalone resource for this purpose.

查看原文本刊更多论文

ChatGPT-3.5用于评估孕期非处方药使用的准确性和安全性：一项描述性比较研究。

随着人工智能（AI）越来越多地用于执行需要人类智能的任务，孕妇可能会向人工智能咨询非处方药（OTC）的建议。然而，怀孕期间使用的药物可能会受到数据可用性的限制，造成严重的安全问题。这项研究的重点是聊天机器人准确提供有关非处方药信息的能力，因为它与孕妇有关。采用前瞻性描述性设计将聊天生成预训练转换器3.5 （ChatGPT-3.5）生成的响应与UpToDate®提供的信息进行比较。通过ChatGPT-3.5评估了美国药剂师推荐的87种顶级OTC药物在怀孕期间的安全使用。在ChatGPT-3.5中输入一个引导的标准提示，并记录响应。与UpToDate相比，两组使用5分李克特量表独立评估了答案的正确性、完整性和安全性。在独立评估之后，小组讨论结果以达成共识，由第三个独立调查员给出最终评级。正确性中位数为5分（四分位间距[IQR]: 5-5）。为了完整性，中位得分为4 （IQR: 4-5）。在安全性方面，中位得分为5分（IQR: 5-5）。尽管总体得分很高，但9%的评估（n = 8）存在安全错误，包括可能导致严重并发症的遗漏，目前使聊天机器人成为不安全的独立资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pharmacy PHARMACOLOGY & PHARMACY-

自引率

9.10%

发文量

141

审稿时长

11 weeks