Artificial Intelligence Outperforms Physicians in General Medical Knowledge, Except in the Paediatrics Domain: A Cross-Sectional Study.

IF 3.8 3区医学 Q2 ENGINEERING, BIOMEDICAL

Bioengineering Pub Date : 2025-06-14 DOI:10.3390/bioengineering12060653

Joana Miranda, Raquel Pereira-Silva, João Guichard, Jorge Meneses, Andreia Neves Carreira, Daniela Seixas

{"title":"Artificial Intelligence Outperforms Physicians in General Medical Knowledge, Except in the Paediatrics Domain: A Cross-Sectional Study.","authors":"Joana Miranda, Raquel Pereira-Silva, João Guichard, Jorge Meneses, Andreia Neves Carreira, Daniela Seixas","doi":"10.3390/bioengineering12060653","DOIUrl":null,"url":null,"abstract":"Generative artificial intelligence (genAI) shows promising results in clinical practice. This study compared a GPT-4-turbo virtual assistant with physicians from Italy, France, Spain, and Portugal on medical knowledge derived from national exams while analysing knowledge retention over time and domain-specific performance. Via a digital platform, 17,144 physicians provided 221,574 answers to 600 exam questions between December 2022 and February 2024. Physicians were stratified by years since graduation and specialty, and the assistant answered the same questions in each native language. Differences in proportions of correct answers were tested with binomial logistic regression (odds ratios, 95% CI) or Fisher's exact test (α = 0.05). The assistant outperformed physicians in all countries (72-96% vs. 46-62%; logistic regression, p < 0.001). Physicians also trailed the assistant across most knowledge domains (p < 0.001), except paediatrics (45% vs. 52%; Fisher, p = 0.60). Accuracy declined with seniority, falling 4-10% between the youngest and oldest cohorts (logistic regression, p < 0.001). Overall, genAI exceeds practising doctors on broad medical knowledge and may help counter knowledge attrition, though paediatrics remains a domain requiring targeted refinement.","PeriodicalId":8874,"journal":{"name":"Bioengineering","volume":"12 6","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12190018/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioengineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3390/bioengineering12060653","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Generative artificial intelligence (genAI) shows promising results in clinical practice. This study compared a GPT-4-turbo virtual assistant with physicians from Italy, France, Spain, and Portugal on medical knowledge derived from national exams while analysing knowledge retention over time and domain-specific performance. Via a digital platform, 17,144 physicians provided 221,574 answers to 600 exam questions between December 2022 and February 2024. Physicians were stratified by years since graduation and specialty, and the assistant answered the same questions in each native language. Differences in proportions of correct answers were tested with binomial logistic regression (odds ratios, 95% CI) or Fisher's exact test (α = 0.05). The assistant outperformed physicians in all countries (72-96% vs. 46-62%; logistic regression, p < 0.001). Physicians also trailed the assistant across most knowledge domains (p < 0.001), except paediatrics (45% vs. 52%; Fisher, p = 0.60). Accuracy declined with seniority, falling 4-10% between the youngest and oldest cohorts (logistic regression, p < 0.001). Overall, genAI exceeds practising doctors on broad medical knowledge and may help counter knowledge attrition, though paediatrics remains a domain requiring targeted refinement.

查看原文本刊更多论文

人工智能在一般医学知识方面优于医生，除了儿科领域：一项横断面研究。

生成式人工智能（genAI）在临床实践中显示出良好的效果。本研究比较了GPT-4-turbo虚拟助理与来自意大利、法国、西班牙和葡萄牙的医生从国家考试中获得的医学知识，同时分析了知识随时间的保留和特定领域的表现。在2022年12月至2024年2月期间，通过数字平台，17,144名医生为600个考试问题提供了221,574个答案。医生按毕业年限和专业进行分层，助理用各自的母语回答同样的问题。正确答案比例的差异采用二项logistic回归（比值比，95% CI）或Fisher精确检验（α = 0.05）进行检验。助理医生在所有国家的表现都优于医生(72-96% vs. 46-62%；Logistic回归，p < 0.001)。医生在大多数知识领域也落后于助理（p < 0.001），但儿科除外(45%对52%；Fisher, p = 0.60)。准确性随着年龄的增长而下降，最年轻和最年长的队列之间下降了4-10%（逻辑回归，p < 0.001）。总的来说，基因人工智能在广泛的医学知识方面超过了执业医生，可能有助于对抗知识消耗，尽管儿科仍然是一个需要有针对性改进的领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Bioengineering Chemical Engineering-Bioengineering

CiteScore

4.00

自引率

8.70%

发文量

661

期刊介绍： Aims Bioengineering (ISSN 2306-5354) provides an advanced forum for the science and technology of bioengineering. It publishes original research papers, comprehensive reviews, communications and case reports. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. All aspects of bioengineering are welcomed from theoretical concepts to education and applications. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. There are, in addition, four key features of this Journal: ● We are introducing a new concept in scientific and technical publications “The Translational Case Report in Bioengineering”. It is a descriptive explanatory analysis of a transformative or translational event. Understanding that the goal of bioengineering scholarship is to advance towards a transformative or clinical solution to an identified transformative/clinical need, the translational case report is used to explore causation in order to find underlying principles that may guide other similar transformative/translational undertakings. ● Manuscripts regarding research proposals and research ideas will be particularly welcomed. ● Electronic files and software regarding the full details of the calculation and experimental procedure, if unable to be published in a normal way, can be deposited as supplementary material. ● We also accept manuscripts communicating to a broader audience with regard to research projects financed with public funds. Scope ● Bionics and biological cybernetics: implantology; bio–abio interfaces ● Bioelectronics: wearable electronics; implantable electronics; “more than Moore” electronics; bioelectronics devices ● Bioprocess and biosystems engineering and applications: bioprocess design; biocatalysis; bioseparation and bioreactors; bioinformatics; bioenergy; etc. ● Biomolecular, cellular and tissue engineering and applications: tissue engineering; chromosome engineering; embryo engineering; cellular, molecular and synthetic biology; metabolic engineering; bio-nanotechnology; micro/nano technologies; genetic engineering; transgenic technology ● Biomedical engineering and applications: biomechatronics; biomedical electronics; biomechanics; biomaterials; biomimetics; biomedical diagnostics; biomedical therapy; biomedical devices; sensors and circuits; biomedical imaging and medical information systems; implants and regenerative medicine; neurotechnology; clinical engineering; rehabilitation engineering ● Biochemical engineering and applications: metabolic pathway engineering; modeling and simulation ● Translational bioengineering