评估ChatGPT-3.5、ChatGPT-4、Bing AI和Bard对抗常规药物相互作用临床工具的敏感性、特异性和准确性。

IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES

Drug, Healthcare and Patient Safety Pub Date : 2023-09-20 eCollection Date: 2023-01-01 DOI:10.2147/DHPS.S425858

Fahmi Y Al-Ashwal, Mohammed Zawiah, Lobna Gharaibeh, Rana Abu-Farha, Ahmad Naoras Bitar

{"title":"评估ChatGPT-3.5、ChatGPT-4、Bing AI和Bard对抗常规药物相互作用临床工具的敏感性、特异性和准确性。","authors":"Fahmi Y Al-Ashwal, Mohammed Zawiah, Lobna Gharaibeh, Rana Abu-Farha, Ahmad Naoras Bitar","doi":"10.2147/DHPS.S425858","DOIUrl":null,"url":null,"abstract":"Background: AI platforms are equipped with advanced ‎algorithms that have the potential to offer a wide range of ‎applications in healthcare services. However, information about the accuracy of AI chatbots against ‎conventional drug-drug interaction tools is limited‎. This study aimed to assess the sensitivity, specificity, and accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard in predicting drug-drug interactions.Methods: AI-based chatbots (ie, ChatGPT-3.5, ChatGPT-4, Microsoft Bing AI, and Google Bard) were compared for their abilities to detect clinically relevant DDIs for 255 drug pairs. Descriptive statistics, such as specificity, sensitivity, accuracy, negative predictive value (NPV), and positive predictive value (PPV), were calculated for each tool.Results: When a subscription tool was used as a reference, the specificity ranged from a low of 0.372 (ChatGPT-3.5) to a high of 0.769 (Microsoft Bing AI). Also, Microsoft Bing AI had the highest performance with an accuracy score of 0.788, with ChatGPT-3.5 having the lowest accuracy rate of 0.469. There was an overall improvement in performance for all the programs when the reference tool switched to a free DDI source, but still, ChatGPT-3.5 had the lowest specificity (0.392) and accuracy (0.525), and Microsoft Bing AI demonstrated the highest specificity (0.892) and accuracy (0.890). When assessing the consistency of accuracy across two different drug classes, ChatGPT-3.5 and ChatGPT-4 showed the highest ‎variability in accuracy. In addition, ChatGPT-3.5, ChatGPT-4, and Bard exhibited the highest ‎fluctuations in specificity when analyzing two medications belonging to the same drug class.Conclusion: Bing AI had the highest accuracy and specificity, outperforming Google's Bard, ChatGPT-3.5, and ChatGPT-4. The findings highlight the significant potential these AI tools hold in transforming patient care. While the current AI platforms evaluated are not without limitations, their ability to quickly analyze potentially significant interactions with good sensitivity suggests a promising step towards improved patient safety.","PeriodicalId":11377,"journal":{"name":"Drug, Healthcare and Patient Safety","volume":"15 ","pages":"137-147"},"PeriodicalIF":3.4000,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/18/c7/dhps-15-137.PMC10518176.pdf","citationCount":"0","resultStr":"{\"title\":\"Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools.\",\"authors\":\"Fahmi Y Al-Ashwal, Mohammed Zawiah, Lobna Gharaibeh, Rana Abu-Farha, Ahmad Naoras Bitar\",\"doi\":\"10.2147/DHPS.S425858\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: AI platforms are equipped with advanced ‎algorithms that have the potential to offer a wide range of ‎applications in healthcare services. However, information about the accuracy of AI chatbots against ‎conventional drug-drug interaction tools is limited‎. This study aimed to assess the sensitivity, specificity, and accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard in predicting drug-drug interactions.Methods: AI-based chatbots (ie, ChatGPT-3.5, ChatGPT-4, Microsoft Bing AI, and Google Bard) were compared for their abilities to detect clinically relevant DDIs for 255 drug pairs. Descriptive statistics, such as specificity, sensitivity, accuracy, negative predictive value (NPV), and positive predictive value (PPV), were calculated for each tool.Results: When a subscription tool was used as a reference, the specificity ranged from a low of 0.372 (ChatGPT-3.5) to a high of 0.769 (Microsoft Bing AI). Also, Microsoft Bing AI had the highest performance with an accuracy score of 0.788, with ChatGPT-3.5 having the lowest accuracy rate of 0.469. There was an overall improvement in performance for all the programs when the reference tool switched to a free DDI source, but still, ChatGPT-3.5 had the lowest specificity (0.392) and accuracy (0.525), and Microsoft Bing AI demonstrated the highest specificity (0.892) and accuracy (0.890). When assessing the consistency of accuracy across two different drug classes, ChatGPT-3.5 and ChatGPT-4 showed the highest ‎variability in accuracy. In addition, ChatGPT-3.5, ChatGPT-4, and Bard exhibited the highest ‎fluctuations in specificity when analyzing two medications belonging to the same drug class.Conclusion: Bing AI had the highest accuracy and specificity, outperforming Google's Bard, ChatGPT-3.5, and ChatGPT-4. The findings highlight the significant potential these AI tools hold in transforming patient care. While the current AI platforms evaluated are not without limitations, their ability to quickly analyze potentially significant interactions with good sensitivity suggests a promising step towards improved patient safety.\",\"PeriodicalId\":11377,\"journal\":{\"name\":\"Drug, Healthcare and Patient Safety\",\"volume\":\"15 \",\"pages\":\"137-147\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2023-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/18/c7/dhps-15-137.PMC10518176.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Drug, Healthcare and Patient Safety\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2147/DHPS.S425858\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Drug, Healthcare and Patient Safety","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2147/DHPS.S425858","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

背景：人工智能平台配备先进‎有潜力提供广泛‎医疗保健服务中的应用。然而，有关人工智能聊天机器人准确性的信息‎传统的药物相互作用工具是有限的‎. 本研究旨在评估ChatGPT-3.5、ChatGPT-4、Bing AI和Bard在预测药物相互作用方面的敏感性、特异性和准确性。方法：比较基于人工智能的聊天机器人（即ChatGPT-3.5、ChatGPT-4、Microsoft Bing AI和Google Bard）检测255对药物的临床相关DDI的能力。计算每种工具的描述性统计数据，如特异性、敏感性、准确性、阴性预测值（NPV）和阳性预测值（PPV）。结果：当使用订阅工具作为参考时，特异性从低0.372（ChatGPT-3.5）到高0.769（Microsoft Bing AI）不等。此外，微软Bing AI的性能最高，准确率为0.788，ChatGPT-3.5的准确率最低，为0.469。当参考工具切换到免费DDI源时，所有程序的性能都有了总体改善，但ChatGPT-3.5的特异性（0.392）和准确性（0.525）最低，Microsoft Bing AI的特异性和准确性（0.892）最高。在评估两种不同药物类别的准确性一致性时，ChatGPT-3.5和ChatGPT-4显示最高‎准确性的可变性。此外，ChatGPT-3.5、ChatGPT-4和Bard的表现最高‎当分析属于同一药物类别的两种药物时，特异性的波动。结论：Bing AI具有最高的准确性和特异性，优于谷歌的Bard、ChatGPT-3.5和ChatGPT-4。这些发现突显了这些人工智能工具在改变患者护理方面的巨大潜力。虽然目前评估的人工智能平台并非没有局限性，但它们能够以良好的灵敏度快速分析潜在的重大交互，这表明朝着提高患者安全性迈出了有希望的一步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools.

查看原文本刊更多论文

Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools.

Background: AI platforms are equipped with advanced ‎algorithms that have the potential to offer a wide range of ‎applications in healthcare services. However, information about the accuracy of AI chatbots against ‎conventional drug-drug interaction tools is limited‎. This study aimed to assess the sensitivity, specificity, and accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard in predicting drug-drug interactions.

Methods: AI-based chatbots (ie, ChatGPT-3.5, ChatGPT-4, Microsoft Bing AI, and Google Bard) were compared for their abilities to detect clinically relevant DDIs for 255 drug pairs. Descriptive statistics, such as specificity, sensitivity, accuracy, negative predictive value (NPV), and positive predictive value (PPV), were calculated for each tool.

Results: When a subscription tool was used as a reference, the specificity ranged from a low of 0.372 (ChatGPT-3.5) to a high of 0.769 (Microsoft Bing AI). Also, Microsoft Bing AI had the highest performance with an accuracy score of 0.788, with ChatGPT-3.5 having the lowest accuracy rate of 0.469. There was an overall improvement in performance for all the programs when the reference tool switched to a free DDI source, but still, ChatGPT-3.5 had the lowest specificity (0.392) and accuracy (0.525), and Microsoft Bing AI demonstrated the highest specificity (0.892) and accuracy (0.890). When assessing the consistency of accuracy across two different drug classes, ChatGPT-3.5 and ChatGPT-4 showed the highest ‎variability in accuracy. In addition, ChatGPT-3.5, ChatGPT-4, and Bard exhibited the highest ‎fluctuations in specificity when analyzing two medications belonging to the same drug class.

Conclusion: Bing AI had the highest accuracy and specificity, outperforming Google's Bard, ChatGPT-3.5, and ChatGPT-4. The findings highlight the significant potential these AI tools hold in transforming patient care. While the current AI platforms evaluated are not without limitations, their ability to quickly analyze potentially significant interactions with good sensitivity suggests a promising step towards improved patient safety.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Drug, Healthcare and Patient Safety HEALTH CARE SCIENCES & SERVICES-

CiteScore

4.10

自引率

0.00%

发文量

审稿时长

16 weeks