Domain-Specific Customization for Language Models in Otolaryngology: The ENT GPT Assistant.

IF 1.8 Q2 OTORHINOLARYNGOLOGY
OTO Open Pub Date : 2025-05-05 eCollection Date: 2025-04-01 DOI:10.1002/oto2.70125
Brenton T Bicknell, Nicholas J Rivers, Adam Skelton, Delaney Sheehan, Charis Hodges, Stevan C Fairburn, Benjamin J Greene, Bharat Panuganti
{"title":"Domain-Specific Customization for Language Models in Otolaryngology: The ENT GPT Assistant.","authors":"Brenton T Bicknell, Nicholas J Rivers, Adam Skelton, Delaney Sheehan, Charis Hodges, Stevan C Fairburn, Benjamin J Greene, Bharat Panuganti","doi":"10.1002/oto2.70125","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To develop and evaluate the effectiveness of domain-specific customization in large language models (LLMs) by assessing the performance of the ENT GPT Assistant (E-GPT-A), a model specifically tailored for otolaryngology.</p><p><strong>Study design: </strong>Comparative analysis using multiple-choice questions (MCQs) from established otolaryngology resources.</p><p><strong>Setting: </strong>Tertiary care academic hospital.</p><p><strong>Methods: </strong>Two hundred forty clinical-vignette style MCQs were sourced from BoardVitals Otolaryngology and OTOQuest, covering a range of otolaryngology subspecialties (n = 40 for each). The E-GPT-A was developed using targeted instructions and customized to otolaryngology. The performance of E-GPT-A was compared against top-performing and widely used artificial intelligence (AI) LLMs, including GPT-3.5, GPT-4, Claude 2.0, and Claude 2.1. Accuracy was assessed across subspecialties, varying question difficulty tiers, and in diagnostics and management.</p><p><strong>Results: </strong>E-GPT-A achieved an overall accuracy of 74.6%, outperforming GPT-3.5 (60.4%), Claude 2.0 (61.7%), Claude 2.1 (60.8%), and GPT-4 (68.3%). The model performed best in allergy and rhinology (85.0%) and laryngology (82.5%), whereas showing lower accuracy in pediatrics (62.5%) and facial plastics/reconstructive surgery (67.5%). Accuracy also declined as question difficulty increased. The average correct response percentage among otolaryngologists and otolaryngology trainees was 71.1% in the question set.</p><p><strong>Conclusion: </strong>This pilot study using the E-GPT-A demonstrates the potential benefits of domain-specific customizations of language models for otolaryngology. However, further development, continuous updates, and continued real-world validation are needed to fully assess the capabilities of LLMs in otolaryngology.</p>","PeriodicalId":19697,"journal":{"name":"OTO Open","volume":"9 2","pages":"e70125"},"PeriodicalIF":1.8000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12051367/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"OTO Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/oto2.70125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: To develop and evaluate the effectiveness of domain-specific customization in large language models (LLMs) by assessing the performance of the ENT GPT Assistant (E-GPT-A), a model specifically tailored for otolaryngology.

Study design: Comparative analysis using multiple-choice questions (MCQs) from established otolaryngology resources.

Setting: Tertiary care academic hospital.

Methods: Two hundred forty clinical-vignette style MCQs were sourced from BoardVitals Otolaryngology and OTOQuest, covering a range of otolaryngology subspecialties (n = 40 for each). The E-GPT-A was developed using targeted instructions and customized to otolaryngology. The performance of E-GPT-A was compared against top-performing and widely used artificial intelligence (AI) LLMs, including GPT-3.5, GPT-4, Claude 2.0, and Claude 2.1. Accuracy was assessed across subspecialties, varying question difficulty tiers, and in diagnostics and management.

Results: E-GPT-A achieved an overall accuracy of 74.6%, outperforming GPT-3.5 (60.4%), Claude 2.0 (61.7%), Claude 2.1 (60.8%), and GPT-4 (68.3%). The model performed best in allergy and rhinology (85.0%) and laryngology (82.5%), whereas showing lower accuracy in pediatrics (62.5%) and facial plastics/reconstructive surgery (67.5%). Accuracy also declined as question difficulty increased. The average correct response percentage among otolaryngologists and otolaryngology trainees was 71.1% in the question set.

Conclusion: This pilot study using the E-GPT-A demonstrates the potential benefits of domain-specific customizations of language models for otolaryngology. However, further development, continuous updates, and continued real-world validation are needed to fully assess the capabilities of LLMs in otolaryngology.

耳鼻喉科语言模型的领域特定定制:耳鼻喉科GPT助手。
目的:通过评估耳鼻喉科专门定制的耳鼻喉科GPT Assistant (E-GPT-A)模型的性能,开发和评估大型语言模型(LLMs)中特定领域定制的有效性。研究设计:比较分析采用多项选择题(mcq)从建立耳鼻喉科资源。环境:三级保健学术医院。方法:240个临床小短文式mcq来自boardvital耳鼻喉科和OTOQuest,涵盖了一系列耳鼻喉科亚专科(每个亚专科n = 40)。E-GPT-A是根据针对性指导和耳鼻喉科定制的。将E-GPT-A的性能与表现最好且广泛使用的人工智能法学硕士(包括GPT-3.5、GPT-4、Claude 2.0和Claude 2.1)进行了比较。准确性评估跨亚专业,不同的问题难度等级,并在诊断和管理。结果:E-GPT-A的总体准确率为74.6%,优于GPT-3.5(60.4%)、Claude 2.0(61.7%)、Claude 2.1(60.8%)和GPT-4(68.3%)。该模型在过敏和鼻科学(85.0%)和喉科学(82.5%)中表现最好,而在儿科(62.5%)和面部整形/重建外科(67.5%)中表现较低。准确率也随着问题难度的增加而下降。在问题集中,耳鼻喉科医师和耳鼻喉科学员的平均正确率为71.1%。结论:这项使用E-GPT-A的试点研究证明了耳鼻喉科特定领域定制语言模型的潜在好处。然而,需要进一步的开发、持续的更新和持续的实际验证来充分评估llm在耳鼻喉科的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
OTO Open
OTO Open Medicine-Surgery
CiteScore
2.70
自引率
0.00%
发文量
115
审稿时长
15 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信