Ashton Huppert Steed, Kenneth Nwosu, Patrick Fillingham, Emma Federico, Sriram Thothathri, Erica Skinner, Tyler Gonzalez, Grant Muller
{"title":"Artificial intelligence and natural language processing for automated coding of cervical and lumbar spine surgery.","authors":"Ashton Huppert Steed, Kenneth Nwosu, Patrick Fillingham, Emma Federico, Sriram Thothathri, Erica Skinner, Tyler Gonzalez, Grant Muller","doi":"10.3171/2025.4.SPINE241099","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Artificial intelligence (AI) in healthcare offers substantial opportunities to enhance efficiencies, reduce costs, and improve clinical outcomes. AI is primed to disrupt legacy healthcare processes such as coding and billing, where there is an estimated $11-$54 billion in challenged revenue annually due to billing complexities and claim denials. The purpose of this study was to assess the accuracy of a novel natural language processing algorithm (NNLPA) in coding spine operative reports, according to Current Procedural Terminology (CPT) codes, as compared to the authors' institutional human coders (IHCs).</p><p><strong>Methods: </strong>Operative notes from consecutive adult patients undergoing cervical and lumbar spine surgery at a large academic medical center were analyzed. A 60:20 stratified split was undertaken to create training and testing populations, respectively. After training, NNLPA coding accuracy was tested against the IHCs', using a highly trained third-party super coder as a control group for accuracy. NNLPA performance metrics were assessed via an F1 score, utilizing precision and recall. Contingency tables were used to determine the sensitivity, specificity, positive predictive value, and negative predictive value. Furthermore, chi-square testing was performed to assess the independence of the metrics between the NNLPA and IHC groups.</p><p><strong>Results: </strong>Overall, 200 operative reports were assessed in this study, and 192 CPT codes (88 cervical, 104 lumbar) were identified. NNLPA and IHC weighted mean F1 scores for lumbar spine surgery coding were 0.84 and 0.56, respectively (p < 0.05). Weighted mean sensitivity, specificity, and accuracy of NNLPA coding was 0.79, 0.99, and 0.98, respectively, and 0.59, 0.97, and 0.96 (p < 0.05) for IHCs. The NNLPA and IHC weighted mean F1 scores for cervical spine surgery coding were 0.73 and 0.68, respectively (p < 0.05). Mean specificity and accuracy for NNLPA coding was 0.99 and 0.95, respectively (p < 0.05), and 0.89 and 0.89 for IHCs (p < 0.05).</p><p><strong>Conclusions: </strong>NNLPA performance was noninferior and possibly superior to IHC performance at spine surgery medical coding. This result contributes to the growing body of literature regarding integration of AI in spine surgery and other clinical applications. Further studies are needed to quantify cost savings associated with using a natural language processing platform for coding compared to humans.</p>","PeriodicalId":16562,"journal":{"name":"Journal of neurosurgery. Spine","volume":" ","pages":"519-524"},"PeriodicalIF":3.1000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of neurosurgery. Spine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3171/2025.4.SPINE241099","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/10/1 0:00:00","PubModel":"Print","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: Artificial intelligence (AI) in healthcare offers substantial opportunities to enhance efficiencies, reduce costs, and improve clinical outcomes. AI is primed to disrupt legacy healthcare processes such as coding and billing, where there is an estimated $11-$54 billion in challenged revenue annually due to billing complexities and claim denials. The purpose of this study was to assess the accuracy of a novel natural language processing algorithm (NNLPA) in coding spine operative reports, according to Current Procedural Terminology (CPT) codes, as compared to the authors' institutional human coders (IHCs).
Methods: Operative notes from consecutive adult patients undergoing cervical and lumbar spine surgery at a large academic medical center were analyzed. A 60:20 stratified split was undertaken to create training and testing populations, respectively. After training, NNLPA coding accuracy was tested against the IHCs', using a highly trained third-party super coder as a control group for accuracy. NNLPA performance metrics were assessed via an F1 score, utilizing precision and recall. Contingency tables were used to determine the sensitivity, specificity, positive predictive value, and negative predictive value. Furthermore, chi-square testing was performed to assess the independence of the metrics between the NNLPA and IHC groups.
Results: Overall, 200 operative reports were assessed in this study, and 192 CPT codes (88 cervical, 104 lumbar) were identified. NNLPA and IHC weighted mean F1 scores for lumbar spine surgery coding were 0.84 and 0.56, respectively (p < 0.05). Weighted mean sensitivity, specificity, and accuracy of NNLPA coding was 0.79, 0.99, and 0.98, respectively, and 0.59, 0.97, and 0.96 (p < 0.05) for IHCs. The NNLPA and IHC weighted mean F1 scores for cervical spine surgery coding were 0.73 and 0.68, respectively (p < 0.05). Mean specificity and accuracy for NNLPA coding was 0.99 and 0.95, respectively (p < 0.05), and 0.89 and 0.89 for IHCs (p < 0.05).
Conclusions: NNLPA performance was noninferior and possibly superior to IHC performance at spine surgery medical coding. This result contributes to the growing body of literature regarding integration of AI in spine surgery and other clinical applications. Further studies are needed to quantify cost savings associated with using a natural language processing platform for coding compared to humans.
期刊介绍:
Primarily publish original works in neurosurgery but also include studies in clinical neurophysiology, organic neurology, ophthalmology, radiology, pathology, and molecular biology.