Artificial intelligence in the prescription of acute medical treatments in primary healthcare - comparison of the performance of family physicians and ChatGPT.

IF 2.6 Q2 MEDICINE, GENERAL & INTERNAL

BMC primary care Pub Date : 2025-09-22 DOI:10.1186/s12875-025-02963-2

Bárbara Lemos Pereira Simão, Carolina Moura Pereira, Mariana Jácome, Catarina Oliveira, Laura Martins Ferreira, José Miguel Paiva, Carlos Braga, Carlos Seiça Cardoso

{"title":"Artificial intelligence in the prescription of acute medical treatments in primary healthcare - comparison of the performance of family physicians and ChatGPT.","authors":"Bárbara Lemos Pereira Simão, Carolina Moura Pereira, Mariana Jácome, Catarina Oliveira, Laura Martins Ferreira, José Miguel Paiva, Carlos Braga, Carlos Seiça Cardoso","doi":"10.1186/s12875-025-02963-2","DOIUrl":null,"url":null,"abstract":"Introduction: Artificial intelligence (AI) is increasingly being recognized as a transformative force in healthcare, showing significant promise in supporting healthcare professionals. AI has many applications in healthcare, including providing real-time decision support, diagnosing diseases, and advancing personalized medicine. However, clinical trials and further research are needed to evaluate the practical effectiveness of AI in primary healthcare.Objective of the study: This study aims to assess the accuracy of ChatGPT, an AI-powered chatbot, in therapeutic decision-making during acute disease consultations in primary care and compare its performance to that of general family physicians. The goal was to determine how well ChatGPT could replicate the decisions made by physicians based on standard clinical guidelines.Materials and methods: A cross-sectional study was conducted at three primary healthcare units in the Central Region of Portugal. The analysis involved three phases: (1) collecting data from healthcare professionals, (2) gathering therapeutic proposals from ChatGPT v3.5 based on physician-defined diagnoses, and (3) comparing the treatments proposed by both ChatGPT v3.5 and the physicians, using the Dynamed platform as the gold standard for correct prescriptions.Results: Out of a total of 860 consultations, 138 were excluded due to non-compliance with the inclusion criteria. The analysis showed that the diagnostic accuracy of ChatGPT v3.5 and physicians co-occurred in 26.2% of cases. In 29.1% of cases, there was no agreement between the AI and the physicians' diagnoses. The therapeutic decisions made by ChatGPT v3.5 were correct in 55.6% of the cases, while physicians made correct decisions in 54.3% of the cases. The therapeutic decisions of ChatGPT v3.5 were incorrect in 5.2% of the cases, compared to 11% for physicians. Furthermore, the therapeutic proposals of ChatGPT v3.5 were 'approximate' to the correct treatment in 24% of the cases, while physicians had a 17.1% approximation rate.Conclusion: This study suggests that AI - specifically ChatGPT v3.5 - can match or even outperform physicians in terms of therapeutic decision accuracy, with a similar or slightly better success rate than human doctors. This highlights the potential for AI to act as an effective auxiliary tool rather than a replacement for healthcare professionals. AI is most effective when used in collaboration with healthcare professionals, augmenting their capabilities and improving overall healthcare delivery. Ultimately, AI can serve as a powerful aid to healthcare professionals, helping improve patient care and healthcare outcomes, particularly in primary care.","PeriodicalId":72428,"journal":{"name":"BMC primary care","volume":"26 1","pages":"284"},"PeriodicalIF":2.6000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12455777/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC primary care","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s12875-025-02963-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Artificial intelligence (AI) is increasingly being recognized as a transformative force in healthcare, showing significant promise in supporting healthcare professionals. AI has many applications in healthcare, including providing real-time decision support, diagnosing diseases, and advancing personalized medicine. However, clinical trials and further research are needed to evaluate the practical effectiveness of AI in primary healthcare.

Objective of the study: This study aims to assess the accuracy of ChatGPT, an AI-powered chatbot, in therapeutic decision-making during acute disease consultations in primary care and compare its performance to that of general family physicians. The goal was to determine how well ChatGPT could replicate the decisions made by physicians based on standard clinical guidelines.

Materials and methods: A cross-sectional study was conducted at three primary healthcare units in the Central Region of Portugal. The analysis involved three phases: (1) collecting data from healthcare professionals, (2) gathering therapeutic proposals from ChatGPT v3.5 based on physician-defined diagnoses, and (3) comparing the treatments proposed by both ChatGPT v3.5 and the physicians, using the Dynamed platform as the gold standard for correct prescriptions.

Results: Out of a total of 860 consultations, 138 were excluded due to non-compliance with the inclusion criteria. The analysis showed that the diagnostic accuracy of ChatGPT v3.5 and physicians co-occurred in 26.2% of cases. In 29.1% of cases, there was no agreement between the AI and the physicians' diagnoses. The therapeutic decisions made by ChatGPT v3.5 were correct in 55.6% of the cases, while physicians made correct decisions in 54.3% of the cases. The therapeutic decisions of ChatGPT v3.5 were incorrect in 5.2% of the cases, compared to 11% for physicians. Furthermore, the therapeutic proposals of ChatGPT v3.5 were 'approximate' to the correct treatment in 24% of the cases, while physicians had a 17.1% approximation rate.

Conclusion: This study suggests that AI - specifically ChatGPT v3.5 - can match or even outperform physicians in terms of therapeutic decision accuracy, with a similar or slightly better success rate than human doctors. This highlights the potential for AI to act as an effective auxiliary tool rather than a replacement for healthcare professionals. AI is most effective when used in collaboration with healthcare professionals, augmenting their capabilities and improving overall healthcare delivery. Ultimately, AI can serve as a powerful aid to healthcare professionals, helping improve patient care and healthcare outcomes, particularly in primary care.

查看原文本刊更多论文

人工智能在初级卫生保健急症医疗处方中的应用——家庭医生与ChatGPT绩效比较

导语：人工智能（AI）越来越被认为是医疗保健领域的变革力量，在支持医疗保健专业人员方面显示出巨大的前景。人工智能在医疗保健领域有很多应用，包括提供实时决策支持、疾病诊断和推进个性化医疗。然而，需要临床试验和进一步的研究来评估人工智能在初级卫生保健中的实际效果。研究目的：本研究旨在评估人工智能聊天机器人ChatGPT在初级保健急性病会诊治疗决策中的准确性，并将其与普通家庭医生的表现进行比较。目的是确定ChatGPT在多大程度上可以复制医生根据标准临床指南做出的决定。材料和方法：在葡萄牙中部地区的三个初级卫生保健单位进行了一项横断面研究。分析涉及三个阶段：(1)从医疗保健专业人员那里收集数据，(2)根据医生定义的诊断从ChatGPT v3.5收集治疗建议，以及(3)使用Dynamed平台作为正确处方的金标准，比较ChatGPT v3.5和医生提出的治疗建议。结果：在总共860例咨询中，138例因不符合纳入标准而被排除。分析显示，ChatGPT v3.5和医生的诊断准确率为26.2%。在29.1%的病例中，人工智能和医生的诊断不一致。ChatGPT v3.5做出的治疗决策在55.6%的病例中是正确的，而医生做出正确决策的病例占54.3%。ChatGPT v3.5的治疗决策在5.2%的病例中是错误的，而医生的这一比例为11%。此外，ChatGPT v3.5的治疗建议在24%的病例中“近似”于正确的治疗，而医生的近似率为17.1%。结论：这项研究表明，人工智能（特别是ChatGPT v3.5）在治疗决策准确性方面可以匹配甚至超过医生，成功率与人类医生相似或略好。这凸显了人工智能作为有效辅助工具的潜力，而不是医疗保健专业人员的替代品。人工智能在与医疗保健专业人员协作使用时最有效，可以增强他们的能力并改善整体医疗保健服务。最终，人工智能可以作为医疗保健专业人员的有力辅助，帮助改善患者护理和医疗保健结果，特别是在初级保健方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC primary care

CiteScore

4.40

自引率

0.00%

发文量