Artificial intelligence and natural language processing for automated coding of cervical and lumbar spine surgery.

IF 3.1 2区医学 Q2 CLINICAL NEUROLOGY

Journal of neurosurgery. Spine Pub Date : 2025-08-01 Print Date: 2025-10-01 DOI:10.3171/2025.4.SPINE241099

Ashton Huppert Steed, Kenneth Nwosu, Patrick Fillingham, Emma Federico, Sriram Thothathri, Erica Skinner, Tyler Gonzalez, Grant Muller

{"title":"Artificial intelligence and natural language processing for automated coding of cervical and lumbar spine surgery.","authors":"Ashton Huppert Steed, Kenneth Nwosu, Patrick Fillingham, Emma Federico, Sriram Thothathri, Erica Skinner, Tyler Gonzalez, Grant Muller","doi":"10.3171/2025.4.SPINE241099","DOIUrl":null,"url":null,"abstract":"Objective: Artificial intelligence (AI) in healthcare offers substantial opportunities to enhance efficiencies, reduce costs, and improve clinical outcomes. AI is primed to disrupt legacy healthcare processes such as coding and billing, where there is an estimated $11-$54 billion in challenged revenue annually due to billing complexities and claim denials. The purpose of this study was to assess the accuracy of a novel natural language processing algorithm (NNLPA) in coding spine operative reports, according to Current Procedural Terminology (CPT) codes, as compared to the authors' institutional human coders (IHCs).Methods: Operative notes from consecutive adult patients undergoing cervical and lumbar spine surgery at a large academic medical center were analyzed. A 60:20 stratified split was undertaken to create training and testing populations, respectively. After training, NNLPA coding accuracy was tested against the IHCs', using a highly trained third-party super coder as a control group for accuracy. NNLPA performance metrics were assessed via an F1 score, utilizing precision and recall. Contingency tables were used to determine the sensitivity, specificity, positive predictive value, and negative predictive value. Furthermore, chi-square testing was performed to assess the independence of the metrics between the NNLPA and IHC groups.Results: Overall, 200 operative reports were assessed in this study, and 192 CPT codes (88 cervical, 104 lumbar) were identified. NNLPA and IHC weighted mean F1 scores for lumbar spine surgery coding were 0.84 and 0.56, respectively (p < 0.05). Weighted mean sensitivity, specificity, and accuracy of NNLPA coding was 0.79, 0.99, and 0.98, respectively, and 0.59, 0.97, and 0.96 (p < 0.05) for IHCs. The NNLPA and IHC weighted mean F1 scores for cervical spine surgery coding were 0.73 and 0.68, respectively (p < 0.05). Mean specificity and accuracy for NNLPA coding was 0.99 and 0.95, respectively (p < 0.05), and 0.89 and 0.89 for IHCs (p < 0.05).Conclusions: NNLPA performance was noninferior and possibly superior to IHC performance at spine surgery medical coding. This result contributes to the growing body of literature regarding integration of AI in spine surgery and other clinical applications. Further studies are needed to quantify cost savings associated with using a natural language processing platform for coding compared to humans.","PeriodicalId":16562,"journal":{"name":"Journal of neurosurgery. Spine","volume":" ","pages":"519-524"},"PeriodicalIF":3.1000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of neurosurgery. Spine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3171/2025.4.SPINE241099","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/10/1 0:00:00","PubModel":"Print","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: Artificial intelligence (AI) in healthcare offers substantial opportunities to enhance efficiencies, reduce costs, and improve clinical outcomes. AI is primed to disrupt legacy healthcare processes such as coding and billing, where there is an estimated $11-$54 billion in challenged revenue annually due to billing complexities and claim denials. The purpose of this study was to assess the accuracy of a novel natural language processing algorithm (NNLPA) in coding spine operative reports, according to Current Procedural Terminology (CPT) codes, as compared to the authors' institutional human coders (IHCs).

Methods: Operative notes from consecutive adult patients undergoing cervical and lumbar spine surgery at a large academic medical center were analyzed. A 60:20 stratified split was undertaken to create training and testing populations, respectively. After training, NNLPA coding accuracy was tested against the IHCs', using a highly trained third-party super coder as a control group for accuracy. NNLPA performance metrics were assessed via an F1 score, utilizing precision and recall. Contingency tables were used to determine the sensitivity, specificity, positive predictive value, and negative predictive value. Furthermore, chi-square testing was performed to assess the independence of the metrics between the NNLPA and IHC groups.

Results: Overall, 200 operative reports were assessed in this study, and 192 CPT codes (88 cervical, 104 lumbar) were identified. NNLPA and IHC weighted mean F1 scores for lumbar spine surgery coding were 0.84 and 0.56, respectively (p < 0.05). Weighted mean sensitivity, specificity, and accuracy of NNLPA coding was 0.79, 0.99, and 0.98, respectively, and 0.59, 0.97, and 0.96 (p < 0.05) for IHCs. The NNLPA and IHC weighted mean F1 scores for cervical spine surgery coding were 0.73 and 0.68, respectively (p < 0.05). Mean specificity and accuracy for NNLPA coding was 0.99 and 0.95, respectively (p < 0.05), and 0.89 and 0.89 for IHCs (p < 0.05).

Conclusions: NNLPA performance was noninferior and possibly superior to IHC performance at spine surgery medical coding. This result contributes to the growing body of literature regarding integration of AI in spine surgery and other clinical applications. Further studies are needed to quantify cost savings associated with using a natural language processing platform for coding compared to humans.

查看原文本刊更多论文

人工智能和自然语言处理用于颈椎和腰椎手术的自动编码。

目的：医疗保健领域的人工智能（AI）为提高效率、降低成本和改善临床结果提供了大量机会。人工智能将颠覆传统的医疗保健流程，如编码和计费，由于计费复杂性和索赔拒绝，每年有110亿至540亿美元的收入受到挑战。本研究的目的是评估一种新的自然语言处理算法（NNLPA）在脊柱手术报告编码中的准确性，根据现行程序术语（CPT）编码，与作者的机构人类编码（ihc）相比。方法：对某大型学术医疗中心连续行颈椎腰椎手术的成人患者的手术记录进行分析。采用60:20的分层划分，分别创建培训和测试人群。训练后，使用训练有素的第三方超级编码器作为准确性对照组，与ihc进行NNLPA编码准确性测试。NNLPA的性能指标通过F1评分进行评估，利用准确率和召回率。采用联列表确定敏感性、特异性、阳性预测值和阴性预测值。此外，采用卡方检验来评估NNLPA组和IHC组之间指标的独立性。结果：总体而言，本研究评估了200例手术报告，确定了192例CPT代码（88例颈椎，104例腰椎）。腰椎手术编码的NNLPA和IHC加权平均F1评分分别为0.84和0.56 （p < 0.05）。NNLPA编码的加权平均敏感性、特异性和准确性分别为0.79、0.99和0.98，ihc的加权平均敏感性、特异性和准确性分别为0.59、0.97和0.96 （p < 0.05）。NNLPA和IHC加权平均F1评分分别为0.73和0.68 （p < 0.05）。NNLPA编码的平均特异性和准确性分别为0.99和0.95 (p < 0.05)， ihc编码的平均特异性和准确性分别为0.89和0.89 （p < 0.05）。结论：NNLPA在脊柱外科医学编码中的表现不逊于IHC，甚至可能优于IHC。这一结果有助于越来越多的文献将人工智能整合到脊柱外科和其他临床应用中。需要进一步的研究来量化与使用自然语言处理平台进行编码相比所节省的成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of neurosurgery. Spine 医学-临床神经学

CiteScore

5.10

自引率

10.70%

发文量

396

审稿时长

6 months

期刊介绍： Primarily publish original works in neurosurgery but also include studies in clinical neurophysiology, organic neurology, ophthalmology, radiology, pathology, and molecular biology.