生成人工智能在耳鼻喉科任务中的表现:系统回顾和荟萃分析

IF 1.5 4区 医学 Q2 OTORHINOLARYNGOLOGY
Sholem Hack , Rebecca Attal , Armin Farzad , Eran E. Alon , Eran Glikson , Eric Remer , Alberto Maria Saibene , Habib G Zalzal
{"title":"生成人工智能在耳鼻喉科任务中的表现:系统回顾和荟萃分析","authors":"Sholem Hack ,&nbsp;Rebecca Attal ,&nbsp;Armin Farzad ,&nbsp;Eran E. Alon ,&nbsp;Eran Glikson ,&nbsp;Eric Remer ,&nbsp;Alberto Maria Saibene ,&nbsp;Habib G Zalzal","doi":"10.1016/j.anl.2025.08.010","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>To systematically evaluate the diagnostic accuracy, educational utility, and communication potential of generative AI, particularly Large Language Models (LLMs) such as ChatGPT, in otolaryngology.</div></div><div><h3>Data Sources</h3><div>A comprehensive search of PubMed, Embase, Scopus, Web of Science, and IEEE Xplore identified English-language peer-reviewed studies from January 2022 to March 2025.</div></div><div><h3>Review Methods</h3><div>Eligible studies evaluated text-based generative AI models used in otolaryngology. Two reviewers screened and assessed studies using JBI and QUADAS-2 tools. A random-effects meta-analysis was conducted on diagnostic accuracy outcomes, with subgroup analyses by task type and model version.</div></div><div><h3>Results</h3><div>Ninety-one studies were included; 61 reported quantitative outcomes. Of these, 43 provided diagnostic accuracy data across 59 model-task pairs. Pooled diagnostic accuracy was 72.7 % (95 % CI: 67.4–77.6 %; I² = 93.8 %). Accuracy was highest in education (83.0 %) and diagnostic imaging tasks (84.9 %), and lowest in clinical decision support (67.1 %). GPT-4 consistently outperformed GPT-3.5 across both education and CDS domains. Hallucinations and performance variability were noted in complex clinical reasoning tasks.</div></div><div><h3>Conclusion</h3><div>Generative AI performs well in structured otolaryngology tasks, particularly education and communication. However, its inconsistent performance in clinical reasoning tasks limits standalone use. Future research should focus on hallucination mitigation, standardized evaluation, and prospective validation to guide safe clinical integration.</div></div>","PeriodicalId":55627,"journal":{"name":"Auris Nasus Larynx","volume":"52 5","pages":"Pages 585-596"},"PeriodicalIF":1.5000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance of generative AI across ENT tasks: A systematic review and meta-analysis\",\"authors\":\"Sholem Hack ,&nbsp;Rebecca Attal ,&nbsp;Armin Farzad ,&nbsp;Eran E. Alon ,&nbsp;Eran Glikson ,&nbsp;Eric Remer ,&nbsp;Alberto Maria Saibene ,&nbsp;Habib G Zalzal\",\"doi\":\"10.1016/j.anl.2025.08.010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective</h3><div>To systematically evaluate the diagnostic accuracy, educational utility, and communication potential of generative AI, particularly Large Language Models (LLMs) such as ChatGPT, in otolaryngology.</div></div><div><h3>Data Sources</h3><div>A comprehensive search of PubMed, Embase, Scopus, Web of Science, and IEEE Xplore identified English-language peer-reviewed studies from January 2022 to March 2025.</div></div><div><h3>Review Methods</h3><div>Eligible studies evaluated text-based generative AI models used in otolaryngology. Two reviewers screened and assessed studies using JBI and QUADAS-2 tools. A random-effects meta-analysis was conducted on diagnostic accuracy outcomes, with subgroup analyses by task type and model version.</div></div><div><h3>Results</h3><div>Ninety-one studies were included; 61 reported quantitative outcomes. Of these, 43 provided diagnostic accuracy data across 59 model-task pairs. Pooled diagnostic accuracy was 72.7 % (95 % CI: 67.4–77.6 %; I² = 93.8 %). Accuracy was highest in education (83.0 %) and diagnostic imaging tasks (84.9 %), and lowest in clinical decision support (67.1 %). GPT-4 consistently outperformed GPT-3.5 across both education and CDS domains. Hallucinations and performance variability were noted in complex clinical reasoning tasks.</div></div><div><h3>Conclusion</h3><div>Generative AI performs well in structured otolaryngology tasks, particularly education and communication. However, its inconsistent performance in clinical reasoning tasks limits standalone use. Future research should focus on hallucination mitigation, standardized evaluation, and prospective validation to guide safe clinical integration.</div></div>\",\"PeriodicalId\":55627,\"journal\":{\"name\":\"Auris Nasus Larynx\",\"volume\":\"52 5\",\"pages\":\"Pages 585-596\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2025-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Auris Nasus Larynx\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0385814625001269\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"OTORHINOLARYNGOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Auris Nasus Larynx","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0385814625001269","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的系统评价生成式人工智能在耳鼻喉科的诊断准确性、教育效用和交流潜力,特别是大语言模型(llm),如ChatGPT。数据来源对PubMed、Embase、Scopus、Web of Science和IEEE explore进行全面搜索,确定了2022年1月至2025年3月的英文同行评议研究。符合条件的研究评估了用于耳鼻喉科的基于文本的生成人工智能模型。两位审稿人使用JBI和QUADAS-2工具筛选和评估研究。对诊断准确性结果进行随机效应荟萃分析,并按任务类型和模型版本进行亚组分析。结果共纳入91项研究;61例报告了定量结果。其中,43个提供了59个模型任务对的诊断准确性数据。合并诊断准确率为72.7% (95% CI: 67.4 - 77.6%; I²= 93.8%)。准确率最高的是教育(83.0%)和诊断成像任务(84.9%),最低的是临床决策支持(67.1%)。在教育和CDS领域,GPT-4的表现始终优于GPT-3.5。在复杂的临床推理任务中,出现了幻觉和表现变异性。结论生成式人工智能在结构化耳鼻喉科任务中表现良好,尤其是教育和交流任务。然而,它在临床推理任务中的不一致表现限制了独立使用。未来的研究应侧重于减轻幻觉、标准化评估和前瞻性验证,以指导安全的临床整合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Performance of generative AI across ENT tasks: A systematic review and meta-analysis

Objective

To systematically evaluate the diagnostic accuracy, educational utility, and communication potential of generative AI, particularly Large Language Models (LLMs) such as ChatGPT, in otolaryngology.

Data Sources

A comprehensive search of PubMed, Embase, Scopus, Web of Science, and IEEE Xplore identified English-language peer-reviewed studies from January 2022 to March 2025.

Review Methods

Eligible studies evaluated text-based generative AI models used in otolaryngology. Two reviewers screened and assessed studies using JBI and QUADAS-2 tools. A random-effects meta-analysis was conducted on diagnostic accuracy outcomes, with subgroup analyses by task type and model version.

Results

Ninety-one studies were included; 61 reported quantitative outcomes. Of these, 43 provided diagnostic accuracy data across 59 model-task pairs. Pooled diagnostic accuracy was 72.7 % (95 % CI: 67.4–77.6 %; I² = 93.8 %). Accuracy was highest in education (83.0 %) and diagnostic imaging tasks (84.9 %), and lowest in clinical decision support (67.1 %). GPT-4 consistently outperformed GPT-3.5 across both education and CDS domains. Hallucinations and performance variability were noted in complex clinical reasoning tasks.

Conclusion

Generative AI performs well in structured otolaryngology tasks, particularly education and communication. However, its inconsistent performance in clinical reasoning tasks limits standalone use. Future research should focus on hallucination mitigation, standardized evaluation, and prospective validation to guide safe clinical integration.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Auris Nasus Larynx
Auris Nasus Larynx 医学-耳鼻喉科学
CiteScore
3.40
自引率
5.90%
发文量
169
审稿时长
30 days
期刊介绍: The international journal Auris Nasus Larynx provides the opportunity for rapid, carefully reviewed publications concerning the fundamental and clinical aspects of otorhinolaryngology and related fields. This includes otology, neurotology, bronchoesophagology, laryngology, rhinology, allergology, head and neck medicine and oncologic surgery, maxillofacial and plastic surgery, audiology, speech science. Original papers, short communications and original case reports can be submitted. Reviews on recent developments are invited regularly and Letters to the Editor commenting on papers or any aspect of Auris Nasus Larynx are welcomed. Founded in 1973 and previously published by the Society for Promotion of International Otorhinolaryngology, the journal is now the official English-language journal of the Oto-Rhino-Laryngological Society of Japan, Inc. The aim of its new international Editorial Board is to make Auris Nasus Larynx an international forum for high quality research and clinical sciences.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信