人工智能在临床决策中的应用：ChatGPT-4与Llama2在耳鼻喉科病例中的应用

IF 1.9 3区医学 Q2 OTORHINOLARYNGOLOGY

European Archives of Oto-Rhino-Laryngology Pub Date : 2025-06-01 Epub Date: 2025-04-12 DOI:10.1007/s00405-025-09371-3

Antonino Maniaci, Cosima C Hoch, Lise Sogalow, Benedikt Schmidl, Jerome R Lechien

{"title":"人工智能在临床决策中的应用：ChatGPT-4与Llama2在耳鼻喉科病例中的应用","authors":"Antonino Maniaci, Cosima C Hoch, Lise Sogalow, Benedikt Schmidl, Jerome R Lechien","doi":"10.1007/s00405-025-09371-3","DOIUrl":null,"url":null,"abstract":"Purpose: To evaluate the diagnostic accuracy, appropriateness of additional examination recommendations, and consistency of therapeutic regimens by ChatGPT-4 and Llama2 based on real otolaryngology cases.Methods: A prospective controlled study was conducted on 98 anonymized otolaryngology cases. Clinical information was entered in ChatGPT-4 and Llama2 for reaching primary diagnoses, additional examination recommendations, and treatment strategies. Two independent otolaryngologists evaluated the AI outputs using the artificial intelligence performance instrument (AIPI), evaluating diagnostic accuracy, appropriateness of examination, and adequacy of treatment. Statistical comparisons were conducted between the AI systems and expert decisions. Interrater reliability was evaluated with kappa statistics.Results: ChatGPT-4 diagnosed 82% correctly, outperforming Llama2 at 76%. For additional examinations, ChatGPT-4 suggested relevant and appropriate tests in 88% of the studies, while Llama2 did so in 83%. Treatment appropriateness was achieved in 80% of the cases through ChatGPT-4 and 72% through Llama2. Sometimes, both systems suggested inappropriate tests. The interrater reliability was high for AIPI scores (kappa = 0.85).Conclusion: ChatGPT-4 and Llama2 have shown great potential as clinical decision-support tools in otolaryngology, with ChatGPT-4 exhibiting superior performance. At the same time, non-relevant recommendations indicate further refinement and human oversight to ensure safe application in clinical practice.","PeriodicalId":11952,"journal":{"name":"European Archives of Oto-Rhino-Laryngology","volume":" ","pages":"3293-3302"},"PeriodicalIF":1.9000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AI in clinical decision-making: ChatGPT-4 vs. Llama2 for otolaryngology cases.\",\"authors\":\"Antonino Maniaci, Cosima C Hoch, Lise Sogalow, Benedikt Schmidl, Jerome R Lechien\",\"doi\":\"10.1007/s00405-025-09371-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: To evaluate the diagnostic accuracy, appropriateness of additional examination recommendations, and consistency of therapeutic regimens by ChatGPT-4 and Llama2 based on real otolaryngology cases.Methods: A prospective controlled study was conducted on 98 anonymized otolaryngology cases. Clinical information was entered in ChatGPT-4 and Llama2 for reaching primary diagnoses, additional examination recommendations, and treatment strategies. Two independent otolaryngologists evaluated the AI outputs using the artificial intelligence performance instrument (AIPI), evaluating diagnostic accuracy, appropriateness of examination, and adequacy of treatment. Statistical comparisons were conducted between the AI systems and expert decisions. Interrater reliability was evaluated with kappa statistics.Results: ChatGPT-4 diagnosed 82% correctly, outperforming Llama2 at 76%. For additional examinations, ChatGPT-4 suggested relevant and appropriate tests in 88% of the studies, while Llama2 did so in 83%. Treatment appropriateness was achieved in 80% of the cases through ChatGPT-4 and 72% through Llama2. Sometimes, both systems suggested inappropriate tests. The interrater reliability was high for AIPI scores (kappa = 0.85).Conclusion: ChatGPT-4 and Llama2 have shown great potential as clinical decision-support tools in otolaryngology, with ChatGPT-4 exhibiting superior performance. At the same time, non-relevant recommendations indicate further refinement and human oversight to ensure safe application in clinical practice.\",\"PeriodicalId\":11952,\"journal\":{\"name\":\"European Archives of Oto-Rhino-Laryngology\",\"volume\":\" \",\"pages\":\"3293-3302\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Archives of Oto-Rhino-Laryngology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s00405-025-09371-3\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/12 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"OTORHINOLARYNGOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Archives of Oto-Rhino-Laryngology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00405-025-09371-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/12 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

目的：评价ChatGPT-4和Llama2基于真实耳鼻喉科病例的诊断准确性、附加检查建议的适宜性和治疗方案的一致性。方法：对98例匿名耳鼻喉科病例进行前瞻性对照研究。在ChatGPT-4和Llama2中输入临床信息，以获得初步诊断、附加检查建议和治疗策略。两名独立的耳鼻喉科医生使用人工智能性能仪器（AIPI）评估AI输出，评估诊断的准确性、检查的适当性和治疗的充分性。在人工智能系统和专家决策之间进行了统计比较。用kappa统计量评价判读器的信度。结果：ChatGPT-4的正确率为82%，优于Llama2的76%。对于额外的检查，ChatGPT-4在88%的研究中建议进行相关和适当的测试，而Llama2在83%的研究中建议进行相关和适当的测试。80%的病例通过ChatGPT-4和72%的病例通过Llama2达到治疗适宜性。有时，两种系统都建议进行不适当的测试。AIPI评分的信度较高（kappa = 0.85）。结论：ChatGPT-4和Llama2作为耳鼻喉科临床决策支持工具具有很大的潜力，其中ChatGPT-4表现更优。与此同时，不相关的建议表明需要进一步改进和人为监督，以确保临床实践中的安全应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

AI in clinical decision-making: ChatGPT-4 vs. Llama2 for otolaryngology cases.

Purpose: To evaluate the diagnostic accuracy, appropriateness of additional examination recommendations, and consistency of therapeutic regimens by ChatGPT-4 and Llama2 based on real otolaryngology cases.

Methods: A prospective controlled study was conducted on 98 anonymized otolaryngology cases. Clinical information was entered in ChatGPT-4 and Llama2 for reaching primary diagnoses, additional examination recommendations, and treatment strategies. Two independent otolaryngologists evaluated the AI outputs using the artificial intelligence performance instrument (AIPI), evaluating diagnostic accuracy, appropriateness of examination, and adequacy of treatment. Statistical comparisons were conducted between the AI systems and expert decisions. Interrater reliability was evaluated with kappa statistics.

Results: ChatGPT-4 diagnosed 82% correctly, outperforming Llama2 at 76%. For additional examinations, ChatGPT-4 suggested relevant and appropriate tests in 88% of the studies, while Llama2 did so in 83%. Treatment appropriateness was achieved in 80% of the cases through ChatGPT-4 and 72% through Llama2. Sometimes, both systems suggested inappropriate tests. The interrater reliability was high for AIPI scores (kappa = 0.85).

Conclusion: ChatGPT-4 and Llama2 have shown great potential as clinical decision-support tools in otolaryngology, with ChatGPT-4 exhibiting superior performance. At the same time, non-relevant recommendations indicate further refinement and human oversight to ensure safe application in clinical practice.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

European Archives of Oto-Rhino-Laryngology 医学-耳鼻喉科学

CiteScore

5.30

自引率

7.70%

发文量

537

审稿时长

2-4 weeks

期刊介绍： Official Journal of European Union of Medical Specialists – ORL Section and Board Official Journal of Confederation of European Oto-Rhino-Laryngology Head and Neck Surgery "European Archives of Oto-Rhino-Laryngology" publishes original clinical reports and clinically relevant experimental studies, as well as short communications presenting new results of special interest. With peer review by a respected international editorial board and prompt English-language publication, the journal provides rapid dissemination of information by authors from around the world. This particular feature makes it the journal of choice for readers who want to be informed about the continuing state of the art concerning basic sciences and the diagnosis and management of diseases of the head and neck on an international level. European Archives of Oto-Rhino-Laryngology was founded in 1864 as "Archiv für Ohrenheilkunde" by A. von Tröltsch, A. Politzer and H. Schwartze.