Comparative Analysis of Information Quality in Pediatric Otorhinolaryngology: Clinicians, Residents, and Large Language Models.

IF 2.5 3区医学 Q1 OTORHINOLARYNGOLOGY

Otolaryngology- Head and Neck Surgery Pub Date : 2025-07-01 Epub Date: 2025-03-19 DOI:10.1002/ohn.1225

Eleonora M C Trecca, Vito Carlo Alberto Caponio, Mario Turri-Zanoni, Antonella Miriam di Lullo, Michele Gaffuri, Jérôme R Lechien, Antonino Maniaci, Giuseppe Maruccio, Marella Reale, Irene Claudia Visconti, Virginia Dallari

{"title":"Comparative Analysis of Information Quality in Pediatric Otorhinolaryngology: Clinicians, Residents, and Large Language Models.","authors":"Eleonora M C Trecca, Vito Carlo Alberto Caponio, Mario Turri-Zanoni, Antonella Miriam di Lullo, Michele Gaffuri, Jérôme R Lechien, Antonino Maniaci, Giuseppe Maruccio, Marella Reale, Irene Claudia Visconti, Virginia Dallari","doi":"10.1002/ohn.1225","DOIUrl":null,"url":null,"abstract":"Objective: Pediatric otorhinolaryngology (ORL) addresses complex conditions in children, requiring a tailored approach for patients and families. With artificial intelligence (AI) gaining traction in medical applications, this study evaluates the quality of information provided by large language models (LLMs) in comparison to clinicians, identifying strengths and limitations in the field of pediatric ORL.Study design: Comparative blinded study.Setting: Controlled research environment using LLMs.Methods: Fifty-four items of increasing difficulty, namely 18 theoretical questions, 18 clinical scenarios, and 18 patient questions, were posed to ChatGPT-3.5, -4.0, -4o, Claude-3, Gemini, Perplexity, Copilot, a second-year resident, and an expert in the field of pediatric ORL. The Quality Analysis of Medical Artificial Intelligence (QAMAI) tool was used for blinded evaluation of the quality of medical information by a panel of expert members from the Young Otolaryngologists Group of the Italian Society of ORL and the International Federation of ORL Societies.Results: LLMs performed comparably to specialist in theoretical and standardized clinical scenarios, with Bing Copilot achieving the highest QAMAI scores. However, AI responses lacked transparency in citing reliable sources and were less effective in addressing patient-centered questions. Poor interrater agreement among reviewers highlighted challenges in distinguishing human-generated from AI-generated responses. Rhinology topics received the highest scores, whereas laryngology and patient-centered questions showed lower agreement and performance.Conclusion: LLMs show promise as supportive resources in pediatric ORL, particularly in theoretical learning and standardized cases. However, significant limitations remain, including source transparency and contextual communication in patient interactions. Human oversight is essential to mitigate risks. Future developments should focus on refining AI capabilities for evidence-based and empathetic communication to support both clinicians and families.","PeriodicalId":19707,"journal":{"name":"Otolaryngology- Head and Neck Surgery","volume":" ","pages":"228-236"},"PeriodicalIF":2.5000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12207379/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Otolaryngology- Head and Neck Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/ohn.1225","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/19 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: Pediatric otorhinolaryngology (ORL) addresses complex conditions in children, requiring a tailored approach for patients and families. With artificial intelligence (AI) gaining traction in medical applications, this study evaluates the quality of information provided by large language models (LLMs) in comparison to clinicians, identifying strengths and limitations in the field of pediatric ORL.

Study design: Comparative blinded study.

Setting: Controlled research environment using LLMs.

Methods: Fifty-four items of increasing difficulty, namely 18 theoretical questions, 18 clinical scenarios, and 18 patient questions, were posed to ChatGPT-3.5, -4.0, -4o, Claude-3, Gemini, Perplexity, Copilot, a second-year resident, and an expert in the field of pediatric ORL. The Quality Analysis of Medical Artificial Intelligence (QAMAI) tool was used for blinded evaluation of the quality of medical information by a panel of expert members from the Young Otolaryngologists Group of the Italian Society of ORL and the International Federation of ORL Societies.

Results: LLMs performed comparably to specialist in theoretical and standardized clinical scenarios, with Bing Copilot achieving the highest QAMAI scores. However, AI responses lacked transparency in citing reliable sources and were less effective in addressing patient-centered questions. Poor interrater agreement among reviewers highlighted challenges in distinguishing human-generated from AI-generated responses. Rhinology topics received the highest scores, whereas laryngology and patient-centered questions showed lower agreement and performance.

Conclusion: LLMs show promise as supportive resources in pediatric ORL, particularly in theoretical learning and standardized cases. However, significant limitations remain, including source transparency and contextual communication in patient interactions. Human oversight is essential to mitigate risks. Future developments should focus on refining AI capabilities for evidence-based and empathetic communication to support both clinicians and families.

查看原文本刊更多论文

儿科耳鼻喉科信息质量的比较分析：临床医生、住院医师和大语言模型。

目的：小儿耳鼻喉科（ORL）解决儿童的复杂情况，需要为患者和家庭量身定制的方法。随着人工智能（AI）在医疗应用领域的发展，本研究评估了大型语言模型（llm）提供的信息质量，并与临床医生进行了比较，确定了儿科ORL领域的优势和局限性。研究设计：比较盲法研究。设置：使用llm的受控研究环境。方法：对ChatGPT-3.5、-4.0、- 40、Claude-3、Gemini、Perplexity、Copilot、二年级住院医师、儿科ORL领域专家进行54道难度递增题，即18道理论题、18道临床场景题和18道患者题。医学人工智能质量分析（QAMAI）工具由来自意大利口腔外科学会青年耳鼻喉科专家小组和国际口腔外科学会联合会的专家组成的小组对医疗信息的质量进行了盲法评估。结果：法学硕士在理论和标准化临床场景中的表现与专科医生相当，其中必应副驾驶获得了最高的QAMAI评分。然而，人工智能在引用可靠来源方面缺乏透明度，在解决以患者为中心的问题方面效率较低。审稿人之间的不一致突出了区分人类生成和人工智能生成的响应的挑战。鼻科主题得分最高，而喉科和以患者为中心的问题表现出较低的一致性和表现。结论：法学硕士有望成为儿科ORL的支持资源，特别是在理论学习和标准化案例中。然而，重大的限制仍然存在，包括来源透明度和患者互动中的上下文沟通。人为监督对于降低风险至关重要。未来的发展应侧重于完善人工智能的能力，以实现基于证据和移情的沟通，为临床医生和家庭提供支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Otolaryngology- Head and Neck Surgery 医学-耳鼻喉科学

CiteScore

6.70

自引率

2.90%

发文量

250

审稿时长

2-4 weeks

期刊介绍： Otolaryngology–Head and Neck Surgery (OTO-HNS) is the official peer-reviewed publication of the American Academy of Otolaryngology–Head and Neck Surgery Foundation. The mission of Otolaryngology–Head and Neck Surgery is to publish contemporary, ethical, clinically relevant information in otolaryngology, head and neck surgery (ear, nose, throat, head, and neck disorders) that can be used by otolaryngologists, clinicians, scientists, and specialists to improve patient care and public health.