利用大语言模型加强临床记录中的物质使用检测。

IF 3.6 2区医学 Q1 PSYCHIATRY

Drug and alcohol dependence Pub Date : 2025-09-30 DOI:10.1016/j.drugalcdep.2025.112888

Fabrice Harel-Canada , Anabel Salimian , Brandon Moghanian , Sarah Clingan , Allan Nguyen , Tucker Avra , Michelle Poimboeuf , Ruby Romero , Arthur Funnell , Panayiotis Petousis , Michael Shin , Nanyun Peng , Chelsea L. Shover , David Goodman-Meza

{"title":"利用大语言模型加强临床记录中的物质使用检测。","authors":"Fabrice Harel-Canada , Anabel Salimian , Brandon Moghanian , Sarah Clingan , Allan Nguyen , Tucker Avra , Michelle Poimboeuf , Ruby Romero , Arthur Funnell , Panayiotis Petousis , Michael Shin , Nanyun Peng , Chelsea L. Shover , David Goodman-Meza","doi":"10.1016/j.drugalcdep.2025.112888","DOIUrl":null,"url":null,"abstract":"<div><div>Identifying substance use behaviors in electronic health records (EHRs) is challenging because critical details are often buried in unstructured notes that use varied terminology and negation, requiring careful contextual interpretation to distinguish relevant use from historical mentions or denials. Using MIMIC-III/IV discharge summaries, we created a large, annotated drug detection dataset to tackle this problem and support future systemic substance use surveillance. We then investigated the performance of multiple large language models (LLMs) for detecting eight substance use categories within this data. Evaluating models in zero-shot, few-shot, and fine-tuning configurations, we found that a fine-tuned model, Llama-DrugDetector-70B, outperformed others. It achieved near-perfect F1-scores (<span><math><mrow><mo>≥</mo><mn>0</mn><mo>.</mo><mn>95</mn></mrow></math></span>) for most individual substances and strong scores for more complex tasks like prescription opioid misuse (F1=0.815) and polysubstance use (F1=0.917). These findings demonstrated that LLMs significantly enhance detection, showing promise for clinical decision support and research, although further work on scalability is warranted.</div></div>","PeriodicalId":11322,"journal":{"name":"Drug and alcohol dependence","volume":"276 ","pages":"Article 112888"},"PeriodicalIF":3.6000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing substance use detection in clinical notes with large language models\",\"authors\":\"Fabrice Harel-Canada , Anabel Salimian , Brandon Moghanian , Sarah Clingan , Allan Nguyen , Tucker Avra , Michelle Poimboeuf , Ruby Romero , Arthur Funnell , Panayiotis Petousis , Michael Shin , Nanyun Peng , Chelsea L. Shover , David Goodman-Meza\",\"doi\":\"10.1016/j.drugalcdep.2025.112888\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Identifying substance use behaviors in electronic health records (EHRs) is challenging because critical details are often buried in unstructured notes that use varied terminology and negation, requiring careful contextual interpretation to distinguish relevant use from historical mentions or denials. Using MIMIC-III/IV discharge summaries, we created a large, annotated drug detection dataset to tackle this problem and support future systemic substance use surveillance. We then investigated the performance of multiple large language models (LLMs) for detecting eight substance use categories within this data. Evaluating models in zero-shot, few-shot, and fine-tuning configurations, we found that a fine-tuned model, Llama-DrugDetector-70B, outperformed others. It achieved near-perfect F1-scores (<span><math><mrow><mo>≥</mo><mn>0</mn><mo>.</mo><mn>95</mn></mrow></math></span>) for most individual substances and strong scores for more complex tasks like prescription opioid misuse (F1=0.815) and polysubstance use (F1=0.917). These findings demonstrated that LLMs significantly enhance detection, showing promise for clinical decision support and research, although further work on scalability is warranted.</div></div>\",\"PeriodicalId\":11322,\"journal\":{\"name\":\"Drug and alcohol dependence\",\"volume\":\"276 \",\"pages\":\"Article 112888\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Drug and alcohol dependence\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0376871625003412\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHIATRY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Drug and alcohol dependence","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0376871625003412","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}

引用次数: 0

摘要

在电子健康记录（EHRs）中识别物质使用行为具有挑战性，因为关键细节通常隐藏在使用各种术语和否定的非结构化笔记中，需要仔细的上下文解释，以区分相关使用与历史提及或否认。使用MIMIC-III/IV出院摘要，我们创建了一个大型的、带注释的药物检测数据集来解决这个问题，并支持未来的系统性药物使用监测。然后，我们研究了多个大型语言模型（llm）在该数据中检测八种物质使用类别的性能。在零射击、少射击和微调配置中评估模型，我们发现微调模型Llama-DrugDetector-70B的表现优于其他模型。对于大多数单独的物质，它获得了近乎完美的F1得分（≥0.95），对于更复杂的任务，如处方阿片类药物滥用（F1=0.815）和多物质使用（F1=0.917），它的得分很高。这些发现表明llm显著提高了检测能力，显示了临床决策支持和研究的前景，尽管需要进一步的可扩展性工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancing substance use detection in clinical notes with large language models

Identifying substance use behaviors in electronic health records (EHRs) is challenging because critical details are often buried in unstructured notes that use varied terminology and negation, requiring careful contextual interpretation to distinguish relevant use from historical mentions or denials. Using MIMIC-III/IV discharge summaries, we created a large, annotated drug detection dataset to tackle this problem and support future systemic substance use surveillance. We then investigated the performance of multiple large language models (LLMs) for detecting eight substance use categories within this data. Evaluating models in zero-shot, few-shot, and fine-tuning configurations, we found that a fine-tuned model, Llama-DrugDetector-70B, outperformed others. It achieved near-perfect F1-scores (

\geq 0.95

) for most individual substances and strong scores for more complex tasks like prescription opioid misuse (F1=0.815) and polysubstance use (F1=0.917). These findings demonstrated that LLMs significantly enhance detection, showing promise for clinical decision support and research, although further work on scalability is warranted.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Drug and alcohol dependence 医学-精神病学

CiteScore

7.40

自引率

7.10%

发文量

409

审稿时长

41 days

期刊介绍： Drug and Alcohol Dependence is an international journal devoted to publishing original research, scholarly reviews, commentaries, and policy analyses in the area of drug, alcohol and tobacco use and dependence. Articles range from studies of the chemistry of substances of abuse, their actions at molecular and cellular sites, in vitro and in vivo investigations of their biochemical, pharmacological and behavioural actions, laboratory-based and clinical research in humans, substance abuse treatment and prevention research, and studies employing methods from epidemiology, sociology, and economics.