Emily L Isch, Judith Monzy, Bhavana Thota, Sydney Somers, D Mitchell Self, E J Caterson
{"title":"Assessing AI Accuracy in Generating CPT Codes From Surgical Operative Notes.","authors":"Emily L Isch, Judith Monzy, Bhavana Thota, Sydney Somers, D Mitchell Self, E J Caterson","doi":"10.1097/SCS.0000000000011258","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Accurate and efficient medical coding is essential for proper reimbursement and health care management. Current Procedural Terminology (CPT) codes, derived from operative notes, standardize medical billing but are often prone to variability and errors due to the complexity of surgical procedures. With advancements in artificial intelligence (AI), tools like ChatGPT and other large language models (LLMs) are being explored for their potential to automate coding tasks. This study evaluates the ability of LLMs to generate accurate CPT codes for craniofacial surgical procedures based on operative notes.</p><p><strong>Methods: </strong>Operative notes for 10 craniofacial surgical cases were collected from a single surgeon at Nemours Children's Health. The notes were provided to AI tools (ChatGPT 4.0 and Gemini) to generate corresponding CPT codes. These AI-generated codes were compared against manually coded results by expert reviewers. Responses were evaluated for accuracy against manually generated CPT codes and classified as correct, partially correct, or incorrect.</p><p><strong>Results: </strong>ChatGPT and Gemini demonstrated similar performance in generating CPT codes, with no statistically significant differences in accuracy or correctness between the models (P > 0.999). Gemini produced a slightly higher proportion of correct responses (30% versus 20%), whereas ChatGPT had more partially correct responses (50% versus 40%).</p><p><strong>Conclusions: </strong>This study demonstrates that AI may be a clinically valuable resource for craniofacial CPT coding, reducing administrative burden and increasing coding accuracy. Findings from this research could inform the integration of AI into medical billing practices, promoting efficiency in surgical specialties. Future research will explore generalizability to other surgical domains and refinement of AI models for coding tasks.</p>","PeriodicalId":15462,"journal":{"name":"Journal of Craniofacial Surgery","volume":" ","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Craniofacial Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/SCS.0000000000011258","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Accurate and efficient medical coding is essential for proper reimbursement and health care management. Current Procedural Terminology (CPT) codes, derived from operative notes, standardize medical billing but are often prone to variability and errors due to the complexity of surgical procedures. With advancements in artificial intelligence (AI), tools like ChatGPT and other large language models (LLMs) are being explored for their potential to automate coding tasks. This study evaluates the ability of LLMs to generate accurate CPT codes for craniofacial surgical procedures based on operative notes.
Methods: Operative notes for 10 craniofacial surgical cases were collected from a single surgeon at Nemours Children's Health. The notes were provided to AI tools (ChatGPT 4.0 and Gemini) to generate corresponding CPT codes. These AI-generated codes were compared against manually coded results by expert reviewers. Responses were evaluated for accuracy against manually generated CPT codes and classified as correct, partially correct, or incorrect.
Results: ChatGPT and Gemini demonstrated similar performance in generating CPT codes, with no statistically significant differences in accuracy or correctness between the models (P > 0.999). Gemini produced a slightly higher proportion of correct responses (30% versus 20%), whereas ChatGPT had more partially correct responses (50% versus 40%).
Conclusions: This study demonstrates that AI may be a clinically valuable resource for craniofacial CPT coding, reducing administrative burden and increasing coding accuracy. Findings from this research could inform the integration of AI into medical billing practices, promoting efficiency in surgical specialties. Future research will explore generalizability to other surgical domains and refinement of AI models for coding tasks.
期刊介绍:
The Journal of Craniofacial Surgery serves as a forum of communication for all those involved in craniofacial surgery, maxillofacial surgery and pediatric plastic surgery. Coverage ranges from practical aspects of craniofacial surgery to the basic science that underlies surgical practice. The journal publishes original articles, scientific reviews, editorials and invited commentary, abstracts and selected articles from international journals, and occasional international bibliographies in craniofacial surgery.