Mohamad El Moheb,Kristin Putman,Olivia Sears,Melina R Kibbe,K Craig Kent,David R Brenin,Allan Tsung
{"title":"An Open-Architecture AI Model for CPT Coding in Breast Surgery: Development, Validation, and Prospective Testing.","authors":"Mohamad El Moheb,Kristin Putman,Olivia Sears,Melina R Kibbe,K Craig Kent,David R Brenin,Allan Tsung","doi":"10.1097/sla.0000000000006793","DOIUrl":null,"url":null,"abstract":"OBJECTIVE\r\nTo develop, validate, and prospectively test an open-architecture, transformer-based Artificial Intelligence (AI) model to extract procedure codes from free-text breast surgery operative notes.\r\n\r\nSUMMARY OF BACKGROUND DATA\r\nOperative note coding is time-intensive and error-prone, leading to lost revenue and compliance risks. While AI offers potential solutions, adoption has been limited due to proprietary, closed-source systems lacking transparency and standardized validation.\r\n\r\nMETHODS\r\nWe included all institutional breast surgery operative notes from July 2017 to December 2023. Expert medical coders manually reviewed and validated surgeon-assigned Current Procedural Terminology (CPT) codes, establishing a reference standard. We developed and validated an AI model to predict CPT codes from operative notes using two versions of the pre-trained GatorTron clinical language model: a compact 345 million-parameter model and a larger 3.9 billion-parameter model, each fine-tuned on our labeled dataset. Performance was evaluated using the area under the precision-recall curve (AUPRC). Prospective testing was conducted on operative notes from May to October 2024.\r\n\r\nRESULTS\r\nOur dataset included 3,259 operative notes with 8,036 CPT codes. Surgeon coding discrepancies were present in 12% of cases (overcoding: 8%, undercoding: 10%). The AI model showed strong alignment with the reference standard (compact version AUPRC: 0.976 [0.970, 0.983], large version AUPRC: 0.981 [0.977, 0.986]) on cross-validation, outperforming surgeons (AUPRC: 0.937). Prospective testing on 268 notes confirmed strong real-world performance.\r\n\r\nCONCLUSIONS\r\nOur open-architecture AI model demonstrated high performance in automating CPT code extraction, offering a scalable and transparent solution to improve surgical coding efficiency. Future work will assess whether AI can surpass human coders in accuracy and reliability.","PeriodicalId":8017,"journal":{"name":"Annals of surgery","volume":"33 1","pages":""},"PeriodicalIF":6.4000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/sla.0000000000006793","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
OBJECTIVE
To develop, validate, and prospectively test an open-architecture, transformer-based Artificial Intelligence (AI) model to extract procedure codes from free-text breast surgery operative notes.
SUMMARY OF BACKGROUND DATA
Operative note coding is time-intensive and error-prone, leading to lost revenue and compliance risks. While AI offers potential solutions, adoption has been limited due to proprietary, closed-source systems lacking transparency and standardized validation.
METHODS
We included all institutional breast surgery operative notes from July 2017 to December 2023. Expert medical coders manually reviewed and validated surgeon-assigned Current Procedural Terminology (CPT) codes, establishing a reference standard. We developed and validated an AI model to predict CPT codes from operative notes using two versions of the pre-trained GatorTron clinical language model: a compact 345 million-parameter model and a larger 3.9 billion-parameter model, each fine-tuned on our labeled dataset. Performance was evaluated using the area under the precision-recall curve (AUPRC). Prospective testing was conducted on operative notes from May to October 2024.
RESULTS
Our dataset included 3,259 operative notes with 8,036 CPT codes. Surgeon coding discrepancies were present in 12% of cases (overcoding: 8%, undercoding: 10%). The AI model showed strong alignment with the reference standard (compact version AUPRC: 0.976 [0.970, 0.983], large version AUPRC: 0.981 [0.977, 0.986]) on cross-validation, outperforming surgeons (AUPRC: 0.937). Prospective testing on 268 notes confirmed strong real-world performance.
CONCLUSIONS
Our open-architecture AI model demonstrated high performance in automating CPT code extraction, offering a scalable and transparent solution to improve surgical coding efficiency. Future work will assess whether AI can surpass human coders in accuracy and reliability.
期刊介绍:
The Annals of Surgery is a renowned surgery journal, recognized globally for its extensive scholarly references. It serves as a valuable resource for the international medical community by disseminating knowledge regarding important developments in surgical science and practice. Surgeons regularly turn to the Annals of Surgery to stay updated on innovative practices and techniques. The journal also offers special editorial features such as "Advances in Surgical Technique," offering timely coverage of ongoing clinical issues. Additionally, the journal publishes monthly review articles that address the latest concerns in surgical practice.