One Third of Alcohol Use Disorder Diagnoses are Missed by ICD Coding.

Substance use & addiction journal Pub Date : 2025-04-01 Epub Date: 2024-11-07 DOI:10.1177/29767342241288112

Laura Mercurio, Augusto Garcia, Stephanie Ruest, Susan J Duffy, Carsten Eickhoff

{"title":"One Third of Alcohol Use Disorder Diagnoses are Missed by ICD Coding.","authors":"Laura Mercurio, Augusto Garcia, Stephanie Ruest, Susan J Duffy, Carsten Eickhoff","doi":"10.1177/29767342241288112","DOIUrl":null,"url":null,"abstract":"Background/significance: Alcohol use carries significant morbidity and mortality, yet accurate identification of alcohol use disorder (AUD) remains a multi-layered problem for both researchers and clinicians.Objective: To fine-tune a language model to AUD in the clinical narrative and to detect AUDs not accounted for by ICD-9 coding in the MIMIC-III database.Materials and methods: We applied clinicalBERT to unique patient discharge summaries. For classification, patients were divided into nonoverlapping groups stratified by the presence/absence of AUD ICD diagnosis for model training (80%), validation (10%), and testing (10%). For detection, the model was trained (80%) and validated (20%) on 1:1 positive/negative patients, then applied to remaining negative patient population. Physicians adjudicated 600 samples from the full model confidence spectrum to confirm AUD by Diagnostic and Statistical Manual of Mental Disorders-V criteria.Results: The model exhibited the following characteristics (mean, standard deviation): precision (0.9, 0.02), recall (0.65, 0.03), F-1 (0.75, 0.02), area under the receiver operating curve (0.97, 0.01), and area under the precision-recall curve (0.86, 0.01). Adjudication produced an estimated 4% under-documentation rate for the total study population. As model confidence increased, AUD under-documentation rate rose to 30% of the number of patients identified as positive by ICD-9 coding.Conclusion: Our model improves the identification of patients meeting AUD criteria, outperforming ICD codes in detecting cases of AUD. Detection discrepancy between ICD and free-text highlights clinician under documentation, not under recognition. Adjudication revealed model over-sensitivity to language around substance use, withdrawal, and chronic liver disease; future study requires application to a broader set of patient age and acuity. This model has the potential to improve rapid identification of patients with AUD and enhance treatment allocation.","PeriodicalId":516535,"journal":{"name":"Substance use & addiction journal","volume":" ","pages":"328-336"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12093275/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Substance use & addiction journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/29767342241288112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/7 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background/significance: Alcohol use carries significant morbidity and mortality, yet accurate identification of alcohol use disorder (AUD) remains a multi-layered problem for both researchers and clinicians.

Objective: To fine-tune a language model to AUD in the clinical narrative and to detect AUDs not accounted for by ICD-9 coding in the MIMIC-III database.

Materials and methods: We applied clinicalBERT to unique patient discharge summaries. For classification, patients were divided into nonoverlapping groups stratified by the presence/absence of AUD ICD diagnosis for model training (80%), validation (10%), and testing (10%). For detection, the model was trained (80%) and validated (20%) on 1:1 positive/negative patients, then applied to remaining negative patient population. Physicians adjudicated 600 samples from the full model confidence spectrum to confirm AUD by Diagnostic and Statistical Manual of Mental Disorders-V criteria.

Results: The model exhibited the following characteristics (mean, standard deviation): precision (0.9, 0.02), recall (0.65, 0.03), F-1 (0.75, 0.02), area under the receiver operating curve (0.97, 0.01), and area under the precision-recall curve (0.86, 0.01). Adjudication produced an estimated 4% under-documentation rate for the total study population. As model confidence increased, AUD under-documentation rate rose to 30% of the number of patients identified as positive by ICD-9 coding.

Conclusion: Our model improves the identification of patients meeting AUD criteria, outperforming ICD codes in detecting cases of AUD. Detection discrepancy between ICD and free-text highlights clinician under documentation, not under recognition. Adjudication revealed model over-sensitivity to language around substance use, withdrawal, and chronic liver disease; future study requires application to a broader set of patient age and acuity. This model has the potential to improve rapid identification of patients with AUD and enhance treatment allocation.

查看原文本刊更多论文

ICD 编码遗漏了三分之一的酒精使用障碍诊断。

背景/意义：饮酒会导致严重的发病率和死亡率，但准确识别饮酒障碍（AUD）仍然是研究人员和临床医生面临的一个多层次问题：目的：对临床叙述中的 AUD 语言模型进行微调，并检测 MIMIC-III 数据库中 ICD-9 编码未包含的 AUD：我们将 clinicalBERT 应用于唯一的患者出院摘要。在分类时，根据是否存在 AUD ICD 诊断将患者分为非重叠组，分别进行模型训练（80%）、验证（10%）和测试（10%）。在检测方面，对 1:1 的阳性/阴性患者进行模型训练（80%）和验证（20%），然后应用于剩余的阴性患者群体。医生根据《精神障碍诊断与统计手册-V》的标准，对整个模型置信区间的 600 个样本进行判定，以确认 AUD：该模型具有以下特征（平均值，标准偏差）：精确度（0.9，0.02）、召回率（0.65，0.03）、F-1（0.75，0.02）、接收者工作曲线下面积（0.97，0.01）和精确度-召回率曲线下面积（0.86，0.01）。在所有研究人群中，通过判定得出的记录不足率估计为 4%。随着模型置信度的增加，AUD 记录不足率上升到 ICD-9 编码确定为阳性患者人数的 30%：结论：我们的模型能更好地识别符合 AUD 标准的患者，在发现 AUD 病例方面优于 ICD 编码。ICD 与自由文本之间的检测差异凸显了临床医生记录不足，而非识别不足。判定结果表明，该模型对药物使用、戒断和慢性肝病的语言过于敏感；未来的研究需要将其应用于更广泛的患者年龄和病情。该模型有望改善对 AUD 患者的快速识别，并加强治疗分配。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Substance use & addiction journal

自引率

0.00%

发文量