Bernadette Cornelison, David R Axon, Bryan Abbott, Carter Bishop, Cindy Jebara, Anjali Kumar, Kristen A Root
{"title":"Accuracy and Safety of ChatGPT-3.5 in Assessing Over-the-Counter Medication Use During Pregnancy: A Descriptive Comparative Study.","authors":"Bernadette Cornelison, David R Axon, Bryan Abbott, Carter Bishop, Cindy Jebara, Anjali Kumar, Kristen A Root","doi":"10.3390/pharmacy13040104","DOIUrl":null,"url":null,"abstract":"<p><p>As artificial intelligence (AI) becomes increasingly utilized to perform tasks requiring human intelligence, patients who are pregnant may turn to AI for advice on over-the-counter (OTC) medications. However, medications used in pregnancy may pose profound safety concerns limited by data availability. This study focuses on a chatbot's ability to accurately provide information regarding OTC medications as it relates to patients that are pregnant. A prospective, descriptive design was used to compare the responses generated by the Chat Generative Pre-Trained Transformer 3.5 (ChatGPT-3.5) to the information provided by UpToDate<sup>®</sup>. Eighty-seven of the top pharmacist-recommended OTC drugs in the United States (U.S.) as identified by Pharmacy Times were assessed for safe use in pregnancy using ChatGPT-3.5. A piloted, standard prompt was input into ChatGPT-3.5, and the responses were recorded. Two groups independently rated the responses compared to UpToDate on their correctness, completeness, and safety using a 5-point Likert scale. After independent evaluations, the groups discussed the findings to reach a consensus, with a third independent investigator giving final ratings. For correctness, the median score was 5 (interquartile range [IQR]: 5-5). For completeness, the median score was 4 (IQR: 4-5). For safety, the median score was 5 (IQR: 5-5). Despite high overall scores, the safety errors in 9% of the evaluations (<i>n</i> = 8), including omissions that pose a risk of serious complications, currently renders the chatbot an unsafe standalone resource for this purpose.</p>","PeriodicalId":30544,"journal":{"name":"Pharmacy","volume":"13 4","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12389367/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pharmacy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/pharmacy13040104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0
Abstract
As artificial intelligence (AI) becomes increasingly utilized to perform tasks requiring human intelligence, patients who are pregnant may turn to AI for advice on over-the-counter (OTC) medications. However, medications used in pregnancy may pose profound safety concerns limited by data availability. This study focuses on a chatbot's ability to accurately provide information regarding OTC medications as it relates to patients that are pregnant. A prospective, descriptive design was used to compare the responses generated by the Chat Generative Pre-Trained Transformer 3.5 (ChatGPT-3.5) to the information provided by UpToDate®. Eighty-seven of the top pharmacist-recommended OTC drugs in the United States (U.S.) as identified by Pharmacy Times were assessed for safe use in pregnancy using ChatGPT-3.5. A piloted, standard prompt was input into ChatGPT-3.5, and the responses were recorded. Two groups independently rated the responses compared to UpToDate on their correctness, completeness, and safety using a 5-point Likert scale. After independent evaluations, the groups discussed the findings to reach a consensus, with a third independent investigator giving final ratings. For correctness, the median score was 5 (interquartile range [IQR]: 5-5). For completeness, the median score was 4 (IQR: 4-5). For safety, the median score was 5 (IQR: 5-5). Despite high overall scores, the safety errors in 9% of the evaluations (n = 8), including omissions that pose a risk of serious complications, currently renders the chatbot an unsafe standalone resource for this purpose.