Evaluating the Efficacy of Large Language Models in CPT Coding for Craniofacial Surgery: A Comparative Analysis.

IF 1 4区 医学 Q3 SURGERY
Emily L Isch, Advith Sarikonda, Abhijeet Sambangi, Angeleah Carreras, Adrija Sircar, D Mitchell Self, Theodore E Habarth-Morales, E J Caterson, Mario Aycart
{"title":"Evaluating the Efficacy of Large Language Models in CPT Coding for Craniofacial Surgery: A Comparative Analysis.","authors":"Emily L Isch, Advith Sarikonda, Abhijeet Sambangi, Angeleah Carreras, Adrija Sircar, D Mitchell Self, Theodore E Habarth-Morales, E J Caterson, Mario Aycart","doi":"10.1097/SCS.0000000000010575","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The advent of Large Language Models (LLMs) like ChatGPT has introduced significant advancements in various surgical disciplines. These developments have led to an increased interest in the utilization of LLMs for Current Procedural Terminology (CPT) coding in surgery. With CPT coding being a complex and time-consuming process, often exacerbated by the scarcity of professional coders, there is a pressing need for innovative solutions to enhance coding efficiency and accuracy.</p><p><strong>Methods: </strong>This observational study evaluated the effectiveness of 5 publicly available large language models-Perplexity.AI, Bard, BingAI, ChatGPT 3.5, and ChatGPT 4.0-in accurately identifying CPT codes for craniofacial procedures. A consistent query format was employed to test each model, ensuring the inclusion of detailed procedure components where necessary. The responses were classified as correct, partially correct, or incorrect based on their alignment with established CPT coding for the specified procedures.</p><p><strong>Results: </strong>The results indicate that while there is no overall significant association between the type of AI model and the correctness of CPT code identification, there are notable differences in performance for simple and complex CPT codes among the models. Specifically, ChatGPT 4.0 showed higher accuracy for complex codes, whereas Perplexity.AI and Bard were more consistent with simple codes.</p><p><strong>Discussion: </strong>The use of AI chatbots for CPT coding in craniofacial surgery presents a promising avenue for reducing the administrative burden and associated costs of manual coding. Despite the lower accuracy rates compared with specialized, trained algorithms, the accessibility and minimal training requirements of the AI chatbots make them attractive alternatives. The study also suggests that priming AI models with operative notes may enhance their accuracy, offering a resource-efficient strategy for improving CPT coding in clinical practice.</p><p><strong>Conclusions: </strong>This study highlights the feasibility and potential benefits of integrating LLMs into the CPT coding process for craniofacial surgery. The findings advocate for further refinement and training of AI models to improve their accuracy and practicality, suggesting a future where AI-assisted coding could become a standard component of surgical workflows, aligning with the ongoing digital transformation in health care.</p>","PeriodicalId":15462,"journal":{"name":"Journal of Craniofacial Surgery","volume":null,"pages":null},"PeriodicalIF":1.0000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Craniofacial Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/SCS.0000000000010575","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The advent of Large Language Models (LLMs) like ChatGPT has introduced significant advancements in various surgical disciplines. These developments have led to an increased interest in the utilization of LLMs for Current Procedural Terminology (CPT) coding in surgery. With CPT coding being a complex and time-consuming process, often exacerbated by the scarcity of professional coders, there is a pressing need for innovative solutions to enhance coding efficiency and accuracy.

Methods: This observational study evaluated the effectiveness of 5 publicly available large language models-Perplexity.AI, Bard, BingAI, ChatGPT 3.5, and ChatGPT 4.0-in accurately identifying CPT codes for craniofacial procedures. A consistent query format was employed to test each model, ensuring the inclusion of detailed procedure components where necessary. The responses were classified as correct, partially correct, or incorrect based on their alignment with established CPT coding for the specified procedures.

Results: The results indicate that while there is no overall significant association between the type of AI model and the correctness of CPT code identification, there are notable differences in performance for simple and complex CPT codes among the models. Specifically, ChatGPT 4.0 showed higher accuracy for complex codes, whereas Perplexity.AI and Bard were more consistent with simple codes.

Discussion: The use of AI chatbots for CPT coding in craniofacial surgery presents a promising avenue for reducing the administrative burden and associated costs of manual coding. Despite the lower accuracy rates compared with specialized, trained algorithms, the accessibility and minimal training requirements of the AI chatbots make them attractive alternatives. The study also suggests that priming AI models with operative notes may enhance their accuracy, offering a resource-efficient strategy for improving CPT coding in clinical practice.

Conclusions: This study highlights the feasibility and potential benefits of integrating LLMs into the CPT coding process for craniofacial surgery. The findings advocate for further refinement and training of AI models to improve their accuracy and practicality, suggesting a future where AI-assisted coding could become a standard component of surgical workflows, aligning with the ongoing digital transformation in health care.

评估颅面外科 CPT 编码中大语言模型的有效性:比较分析。
背景:大型语言模型(LLM)(如 ChatGPT)的出现为各外科学科带来了重大进步。这些发展促使人们越来越关注在外科手术中利用 LLM 进行当前程序术语 (CPT) 编码。CPT 编码是一个复杂而耗时的过程,专业编码员的稀缺往往会使这一过程变得更加复杂,因此迫切需要创新的解决方案来提高编码效率和准确性:本观察性研究评估了 5 种公开的大型语言模型(Perplexity.AI、Bard、BingAI、ChatGPT 3.5 和 ChatGPT 4.0)在准确识别颅面手术 CPT 代码方面的有效性。测试每个模型时都采用了一致的查询格式,确保在必要时包含详细的程序组件。根据回答是否符合特定手术的 CPT 编码,将其分为正确、部分正确和不正确:结果表明,虽然人工智能模型的类型与 CPT 代码识别的正确性之间总体上没有明显的关联,但不同模型在识别简单和复杂 CPT 代码时的表现却存在明显差异。具体来说,ChatGPT 4.0 对复杂代码的识别准确率更高,而 Perplexity.AI 和 Bard 对简单代码的识别准确率更高:讨论:在颅颌面外科中使用人工智能聊天机器人进行 CPT 编码为减少人工编码的管理负担和相关成本提供了一个很有前景的途径。尽管与经过培训的专业算法相比准确率较低,但人工智能聊天机器人的易用性和最低培训要求使其成为具有吸引力的替代方案。研究还表明,用手术笔记来引导人工智能模型可能会提高其准确性,从而为改善临床实践中的 CPT 编码提供了一种资源节约型策略:本研究强调了将 LLMs 集成到颅颌面手术 CPT 编码流程中的可行性和潜在益处。研究结果提倡进一步完善和培训人工智能模型,以提高其准确性和实用性,这表明未来人工智能辅助编码将成为外科工作流程的标准组成部分,与医疗保健领域正在进行的数字化转型相一致。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.70
自引率
11.10%
发文量
968
审稿时长
1.5 months
期刊介绍: ​The Journal of Craniofacial Surgery serves as a forum of communication for all those involved in craniofacial surgery, maxillofacial surgery and pediatric plastic surgery. Coverage ranges from practical aspects of craniofacial surgery to the basic science that underlies surgical practice. The journal publishes original articles, scientific reviews, editorials and invited commentary, abstracts and selected articles from international journals, and occasional international bibliographies in craniofacial surgery.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信