A clinician-based comparative study of large language models in answering medical questions: the case of asthma.

IF 2.1 3区 医学 Q2 PEDIATRICS
Frontiers in Pediatrics Pub Date : 2025-04-25 eCollection Date: 2025-01-01 DOI:10.3389/fped.2025.1461026
Yong Yin, Mei Zeng, Hansong Wang, Haibo Yang, Caijing Zhou, Feng Jiang, Shufan Wu, Tingyue Huang, Shuahua Yuan, Jilei Lin, Mingyu Tang, Jiande Chen, Bin Dong, Jiajun Yuan, Dan Xie
{"title":"A clinician-based comparative study of large language models in answering medical questions: the case of asthma.","authors":"Yong Yin, Mei Zeng, Hansong Wang, Haibo Yang, Caijing Zhou, Feng Jiang, Shufan Wu, Tingyue Huang, Shuahua Yuan, Jilei Lin, Mingyu Tang, Jiande Chen, Bin Dong, Jiajun Yuan, Dan Xie","doi":"10.3389/fped.2025.1461026","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This study aims to evaluate and compare the performance of four major large language models (GPT-3.5, GPT-4.0, YouChat, and Perplexity) in answering 32 common asthma-related questions.</p><p><strong>Materials and methods: </strong>Seventy-five clinicians from various tertiary hospitals participated in this study. Each clinician was tasked with evaluating the responses generated by the four large language models (LLMs) to 32 common clinical questions related to pediatric asthma. Based on predefined criteria, participants subjectively assessed the accuracy, correctness, completeness, and practicality of the LLMs' answers. The participants provided precise scores to determine the performance of each language model in answering pediatric asthma-related questions.</p><p><strong>Results: </strong>GPT-4.0 performed the best across all dimensions, while YouChat performed the worst in all dimensions. Both GPT-3.5 and GPT-4.0 outperformed the other two models, but there was no significant difference in performance between GPT-3.5 and GPT-4.0 or between YouChat and Perplexity.</p><p><strong>Conclusion: </strong>GPT and other large language models can answer medical questions with a certain degree of completeness and accuracy. However, clinical physicians should critically assess internet information, distinguishing between true and false data, and should not blindly accept the outputs of these models. With advancements in key technologies, LLMs may one day become a safe option for doctors seeking information.</p>","PeriodicalId":12637,"journal":{"name":"Frontiers in Pediatrics","volume":"13 ","pages":"1461026"},"PeriodicalIF":2.1000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12062090/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Pediatrics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fped.2025.1461026","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"PEDIATRICS","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: This study aims to evaluate and compare the performance of four major large language models (GPT-3.5, GPT-4.0, YouChat, and Perplexity) in answering 32 common asthma-related questions.

Materials and methods: Seventy-five clinicians from various tertiary hospitals participated in this study. Each clinician was tasked with evaluating the responses generated by the four large language models (LLMs) to 32 common clinical questions related to pediatric asthma. Based on predefined criteria, participants subjectively assessed the accuracy, correctness, completeness, and practicality of the LLMs' answers. The participants provided precise scores to determine the performance of each language model in answering pediatric asthma-related questions.

Results: GPT-4.0 performed the best across all dimensions, while YouChat performed the worst in all dimensions. Both GPT-3.5 and GPT-4.0 outperformed the other two models, but there was no significant difference in performance between GPT-3.5 and GPT-4.0 or between YouChat and Perplexity.

Conclusion: GPT and other large language models can answer medical questions with a certain degree of completeness and accuracy. However, clinical physicians should critically assess internet information, distinguishing between true and false data, and should not blindly accept the outputs of these models. With advancements in key technologies, LLMs may one day become a safe option for doctors seeking information.

基于临床的大语言模型在回答医学问题中的比较研究:哮喘病例。
目的:本研究旨在评估和比较四种大型语言模型(GPT-3.5、GPT-4.0、YouChat和Perplexity)在回答32个常见哮喘相关问题中的性能。材料与方法:来自各三级医院的75名临床医生参与了本研究。每位临床医生的任务是评估四个大型语言模型(llm)对32个与儿童哮喘相关的常见临床问题的反应。根据预先设定的标准,参与者主观地评估法学硕士答案的准确性、正确性、完整性和实用性。参与者提供了精确的分数,以确定每种语言模型在回答儿童哮喘相关问题时的表现。结果:GPT-4.0在所有维度上表现最好,而优信在所有维度上表现最差。GPT-3.5和GPT-4.0的性能都优于其他两个模型,但GPT-3.5和GPT-4.0之间以及YouChat和Perplexity之间的性能没有显著差异。结论:GPT等大型语言模型能够以一定的完整性和准确性回答医学问题。然而,临床医生应该批判性地评估互联网信息,区分真实和虚假的数据,不应该盲目地接受这些模型的输出。随着关键技术的进步,法学硕士可能有一天会成为医生寻求信息的安全选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Frontiers in Pediatrics
Frontiers in Pediatrics Medicine-Pediatrics, Perinatology and Child Health
CiteScore
3.60
自引率
7.70%
发文量
2132
审稿时长
14 weeks
期刊介绍: Frontiers in Pediatrics (Impact Factor 2.33) publishes rigorously peer-reviewed research broadly across the field, from basic to clinical research that meets ongoing challenges in pediatric patient care and child health. Field Chief Editors Arjan Te Pas at Leiden University and Michael L. Moritz at the Children''s Hospital of Pittsburgh are supported by an outstanding Editorial Board of international experts. This multidisciplinary open-access journal is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics, clinicians and the public worldwide. Frontiers in Pediatrics also features Research Topics, Frontiers special theme-focused issues managed by Guest Associate Editors, addressing important areas in pediatrics. In this fashion, Frontiers serves as an outlet to publish the broadest aspects of pediatrics in both basic and clinical research, including high-quality reviews, case reports, editorials and commentaries related to all aspects of pediatrics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信