利用大语言模型加强临床记录中的物质使用检测。

IF 3.6 2区 医学 Q1 PSYCHIATRY
Fabrice Harel-Canada , Anabel Salimian , Brandon Moghanian , Sarah Clingan , Allan Nguyen , Tucker Avra , Michelle Poimboeuf , Ruby Romero , Arthur Funnell , Panayiotis Petousis , Michael Shin , Nanyun Peng , Chelsea L. Shover , David Goodman-Meza
{"title":"利用大语言模型加强临床记录中的物质使用检测。","authors":"Fabrice Harel-Canada ,&nbsp;Anabel Salimian ,&nbsp;Brandon Moghanian ,&nbsp;Sarah Clingan ,&nbsp;Allan Nguyen ,&nbsp;Tucker Avra ,&nbsp;Michelle Poimboeuf ,&nbsp;Ruby Romero ,&nbsp;Arthur Funnell ,&nbsp;Panayiotis Petousis ,&nbsp;Michael Shin ,&nbsp;Nanyun Peng ,&nbsp;Chelsea L. Shover ,&nbsp;David Goodman-Meza","doi":"10.1016/j.drugalcdep.2025.112888","DOIUrl":null,"url":null,"abstract":"<div><div>Identifying substance use behaviors in electronic health records (EHRs) is challenging because critical details are often buried in unstructured notes that use varied terminology and negation, requiring careful contextual interpretation to distinguish relevant use from historical mentions or denials. Using MIMIC-III/IV discharge summaries, we created a large, annotated drug detection dataset to tackle this problem and support future systemic substance use surveillance. We then investigated the performance of multiple large language models (LLMs) for detecting eight substance use categories within this data. Evaluating models in zero-shot, few-shot, and fine-tuning configurations, we found that a fine-tuned model, Llama-DrugDetector-70B, outperformed others. It achieved near-perfect F1-scores (<span><math><mrow><mo>≥</mo><mn>0</mn><mo>.</mo><mn>95</mn></mrow></math></span>) for most individual substances and strong scores for more complex tasks like prescription opioid misuse (F1=0.815) and polysubstance use (F1=0.917). These findings demonstrated that LLMs significantly enhance detection, showing promise for clinical decision support and research, although further work on scalability is warranted.</div></div>","PeriodicalId":11322,"journal":{"name":"Drug and alcohol dependence","volume":"276 ","pages":"Article 112888"},"PeriodicalIF":3.6000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing substance use detection in clinical notes with large language models\",\"authors\":\"Fabrice Harel-Canada ,&nbsp;Anabel Salimian ,&nbsp;Brandon Moghanian ,&nbsp;Sarah Clingan ,&nbsp;Allan Nguyen ,&nbsp;Tucker Avra ,&nbsp;Michelle Poimboeuf ,&nbsp;Ruby Romero ,&nbsp;Arthur Funnell ,&nbsp;Panayiotis Petousis ,&nbsp;Michael Shin ,&nbsp;Nanyun Peng ,&nbsp;Chelsea L. Shover ,&nbsp;David Goodman-Meza\",\"doi\":\"10.1016/j.drugalcdep.2025.112888\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Identifying substance use behaviors in electronic health records (EHRs) is challenging because critical details are often buried in unstructured notes that use varied terminology and negation, requiring careful contextual interpretation to distinguish relevant use from historical mentions or denials. Using MIMIC-III/IV discharge summaries, we created a large, annotated drug detection dataset to tackle this problem and support future systemic substance use surveillance. We then investigated the performance of multiple large language models (LLMs) for detecting eight substance use categories within this data. Evaluating models in zero-shot, few-shot, and fine-tuning configurations, we found that a fine-tuned model, Llama-DrugDetector-70B, outperformed others. It achieved near-perfect F1-scores (<span><math><mrow><mo>≥</mo><mn>0</mn><mo>.</mo><mn>95</mn></mrow></math></span>) for most individual substances and strong scores for more complex tasks like prescription opioid misuse (F1=0.815) and polysubstance use (F1=0.917). These findings demonstrated that LLMs significantly enhance detection, showing promise for clinical decision support and research, although further work on scalability is warranted.</div></div>\",\"PeriodicalId\":11322,\"journal\":{\"name\":\"Drug and alcohol dependence\",\"volume\":\"276 \",\"pages\":\"Article 112888\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Drug and alcohol dependence\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0376871625003412\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHIATRY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Drug and alcohol dependence","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0376871625003412","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0

摘要

在电子健康记录(EHRs)中识别物质使用行为具有挑战性,因为关键细节通常隐藏在使用各种术语和否定的非结构化笔记中,需要仔细的上下文解释,以区分相关使用与历史提及或否认。使用MIMIC-III/IV出院摘要,我们创建了一个大型的、带注释的药物检测数据集来解决这个问题,并支持未来的系统性药物使用监测。然后,我们研究了多个大型语言模型(llm)在该数据中检测八种物质使用类别的性能。在零射击、少射击和微调配置中评估模型,我们发现微调模型Llama-DrugDetector-70B的表现优于其他模型。对于大多数单独的物质,它获得了近乎完美的F1得分(≥0.95),对于更复杂的任务,如处方阿片类药物滥用(F1=0.815)和多物质使用(F1=0.917),它的得分很高。这些发现表明llm显著提高了检测能力,显示了临床决策支持和研究的前景,尽管需要进一步的可扩展性工作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Enhancing substance use detection in clinical notes with large language models
Identifying substance use behaviors in electronic health records (EHRs) is challenging because critical details are often buried in unstructured notes that use varied terminology and negation, requiring careful contextual interpretation to distinguish relevant use from historical mentions or denials. Using MIMIC-III/IV discharge summaries, we created a large, annotated drug detection dataset to tackle this problem and support future systemic substance use surveillance. We then investigated the performance of multiple large language models (LLMs) for detecting eight substance use categories within this data. Evaluating models in zero-shot, few-shot, and fine-tuning configurations, we found that a fine-tuned model, Llama-DrugDetector-70B, outperformed others. It achieved near-perfect F1-scores (0.95) for most individual substances and strong scores for more complex tasks like prescription opioid misuse (F1=0.815) and polysubstance use (F1=0.917). These findings demonstrated that LLMs significantly enhance detection, showing promise for clinical decision support and research, although further work on scalability is warranted.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Drug and alcohol dependence
Drug and alcohol dependence 医学-精神病学
CiteScore
7.40
自引率
7.10%
发文量
409
审稿时长
41 days
期刊介绍: Drug and Alcohol Dependence is an international journal devoted to publishing original research, scholarly reviews, commentaries, and policy analyses in the area of drug, alcohol and tobacco use and dependence. Articles range from studies of the chemistry of substances of abuse, their actions at molecular and cellular sites, in vitro and in vivo investigations of their biochemical, pharmacological and behavioural actions, laboratory-based and clinical research in humans, substance abuse treatment and prevention research, and studies employing methods from epidemiology, sociology, and economics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信