Improving Drug Identification in Overdose Death Surveillance using Large Language Models.

ArXiv Pub Date : 2025-07-16

Arthur J Funnell, Panayiotis Petousis, Fabrice Harel-Canada, Ruby Romero, Alex A T Bui, Adam Koncsol, Hritika Chaturvedi, Chelsea Shover, David Goodman-Meza

{"title":"Improving Drug Identification in Overdose Death Surveillance using Large Language Models.","authors":"Arthur J Funnell, Panayiotis Petousis, Fabrice Harel-Canada, Ruby Romero, Alex A T Bui, Adam Koncsol, Hritika Chaturvedi, Chelsea Shover, David Goodman-Meza","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>The rising rate of drug-related deaths in the United States, largely driven by fentanyl, requires timely and accurate surveillance. However, critical overdose data are often buried in free-text coroner reports, leading to delays and information loss when coded into ICD (International Classification of Disease)-10 classifications. Natural language processing (NLP) models may automate and enhance overdose surveillance, but prior applications have been limited. A dataset of 35,433 death records from multiple U.S. jurisdictions in 2020 was used for model training and internal testing. External validation was conducted using a novel separate dataset of 3,335 records from 2023-2024. Multiple NLP approaches were evaluated for classifying specific drug involvement from unstructured death certificate text. These included traditional single- and multi-label classifiers, as well as fine-tuned encoder-only language models such as Bidirectional Encoder Representations from Transformers (BERT) and BioClinicalBERT, and contemporary decoder-only large language models such as Qwen 3 and Llama 3. Model performance was assessed using macro-averaged F1 scores, and 95% confidence intervals were calculated to quantify uncertainty. Fine-tuned BioClinicalBERT models achieved near-perfect performance, with macro F1 scores >=0.998 on the internal test set. External validation confirmed robustness (macro F1=0.966), outperforming conventional machine learning, general-domain BERT models, and various decoder-only large language models. NLP models, particularly fine-tuned clinical variants like BioClinicalBERT, offer a highly accurate and scalable solution for overdose death classification from free-text reports. These methods can significantly accelerate surveillance workflows, overcoming the limitations of manual ICD-10 coding and supporting near real-time detection of emerging substance use trends.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288657/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The rising rate of drug-related deaths in the United States, largely driven by fentanyl, requires timely and accurate surveillance. However, critical overdose data are often buried in free-text coroner reports, leading to delays and information loss when coded into ICD (International Classification of Disease)-10 classifications. Natural language processing (NLP) models may automate and enhance overdose surveillance, but prior applications have been limited. A dataset of 35,433 death records from multiple U.S. jurisdictions in 2020 was used for model training and internal testing. External validation was conducted using a novel separate dataset of 3,335 records from 2023-2024. Multiple NLP approaches were evaluated for classifying specific drug involvement from unstructured death certificate text. These included traditional single- and multi-label classifiers, as well as fine-tuned encoder-only language models such as Bidirectional Encoder Representations from Transformers (BERT) and BioClinicalBERT, and contemporary decoder-only large language models such as Qwen 3 and Llama 3. Model performance was assessed using macro-averaged F1 scores, and 95% confidence intervals were calculated to quantify uncertainty. Fine-tuned BioClinicalBERT models achieved near-perfect performance, with macro F1 scores >=0.998 on the internal test set. External validation confirmed robustness (macro F1=0.966), outperforming conventional machine learning, general-domain BERT models, and various decoder-only large language models. NLP models, particularly fine-tuned clinical variants like BioClinicalBERT, offer a highly accurate and scalable solution for overdose death classification from free-text reports. These methods can significantly accelerate surveillance workflows, overcoming the limitations of manual ICD-10 coding and supporting near real-time detection of emerging substance use trends.

本刊更多论文

使用大语言模型改进药物过量死亡监测中的药物识别。

在美国，主要由芬太尼引起的与毒品有关的死亡率不断上升，需要及时和准确的监测。然而，关键的过量用药数据往往隐藏在自由文本验尸报告中，导致在编码为ICD（国际疾病分类）-10分类时出现延误和信息丢失。自然语言处理（NLP）模型可以自动化和增强过量监测，但先前的应用受到限制。2020年来自美国多个司法管辖区的35,433个死亡记录的数据集用于模型训练和内部测试。外部验证使用了一个新的独立数据集，其中包含2023-2024年的3335条记录。评估了从非结构化死亡证明文本中对特定药物影响进行分类的多种NLP方法。这些包括传统的单标签和多标签分类器，以及微调的编码器语言模型，如来自变形金刚（BERT）和BioClinicalBERT的双向编码器表示，以及当代的仅解码的大型语言模型，如Qwen 3和Llama 3。使用宏观平均F1分数评估模型性能，并计算95%置信区间来量化不确定性。经过微调的BioClinicalBERT模型取得了近乎完美的性能，在内部测试集上宏观F1得分>=0.998。外部验证证实了鲁棒性（宏观F1=0.966），优于传统的机器学习、通用领域BERT模型和各种仅解码器的大型语言模型。NLP模型，特别是像BioClinicalBERT这样经过微调的临床变体，为自由文本报告中的过量死亡分类提供了高度准确和可扩展的解决方案。这些方法可以显著加快监测工作流程，克服手工ICD-10编码的局限性，并支持近实时检测新出现的物质使用趋势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ArXiv

自引率

0.00%

发文量