Fateme Nateghi Haredasht, Ivan Lopez, Steven Tate, Pooya Ashtari, Min Min Chan, Deepali Kulkarni, Chwen-Yuen Angie Chen, Maithri Vangala, Kira Griffith, Bryan Bunning, Adam S Miner, Tina Hernandez-Boussard, Keith Humphreys, Anna Lembke, L Alexander Vance, Jonathan H Chen
{"title":"Predicting treatment retention in medication for opioid use disorder: a machine learning approach using NLP and LLM-derived clinical features.","authors":"Fateme Nateghi Haredasht, Ivan Lopez, Steven Tate, Pooya Ashtari, Min Min Chan, Deepali Kulkarni, Chwen-Yuen Angie Chen, Maithri Vangala, Kira Griffith, Bryan Bunning, Adam S Miner, Tina Hernandez-Boussard, Keith Humphreys, Anna Lembke, L Alexander Vance, Jonathan H Chen","doi":"10.1093/jamia/ocaf157","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Building upon our previous work on predicting treatment retention in medications for opioid use disorder, we aimed to improve 6-month retention prediction in buprenorphine-naloxone (BUP-NAL) therapy by incorporating features derived from large language models (LLMs) applied to unstructured clinical notes.</p><p><strong>Materials and methods: </strong>We used de-identified electronic health record (EHR) data from Stanford Health Care (STARR) for model development and internal validation, and the NeuroBlu behavioral health database for external validation. Structured features were supplemented with 13 clinical and psychosocial features extracted from free-text notes using the CLinical Entity Augmented Retrieval pipeline, which combines named entity recognition with LLM-based classification to provide contextual interpretation. We trained classification (Logistic Regression, Random Forest, XGBoost) and survival models (CoxPH, Random Survival Forest, Survival XGBoost), evaluated using Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) and C-index.</p><p><strong>Results: </strong>XGBoost achieved the highest classification performance (ROC-AUC = 0.65). Incorporating LLM-derived features improved model performance across all architectures, with the largest gains observed in simpler models such as Logistic Regression. In time-to-event analysis, Random Survival Forest and Survival XGBoost reached the highest C-index (≈0.65). SHapley Additive exPlanations analysis identified LLM-extracted features like Chronic Pain, Liver Disease, and Major Depression as key predictors. We also developed an interactive web tool for real-time clinical use.</p><p><strong>Discussion: </strong>Features extracted using NLP and LLM-assisted methods improved model accuracy and interpretability, revealing valuable psychosocial risks not captured in structured EHRs.</p><p><strong>Conclusion: </strong>Combining structured EHR data with LLM-extracted features moderately improves BUP-NAL retention prediction, enabling personalized risk stratification and advancing AI-driven care for substance use disorders.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocaf157","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: Building upon our previous work on predicting treatment retention in medications for opioid use disorder, we aimed to improve 6-month retention prediction in buprenorphine-naloxone (BUP-NAL) therapy by incorporating features derived from large language models (LLMs) applied to unstructured clinical notes.
Materials and methods: We used de-identified electronic health record (EHR) data from Stanford Health Care (STARR) for model development and internal validation, and the NeuroBlu behavioral health database for external validation. Structured features were supplemented with 13 clinical and psychosocial features extracted from free-text notes using the CLinical Entity Augmented Retrieval pipeline, which combines named entity recognition with LLM-based classification to provide contextual interpretation. We trained classification (Logistic Regression, Random Forest, XGBoost) and survival models (CoxPH, Random Survival Forest, Survival XGBoost), evaluated using Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) and C-index.
Results: XGBoost achieved the highest classification performance (ROC-AUC = 0.65). Incorporating LLM-derived features improved model performance across all architectures, with the largest gains observed in simpler models such as Logistic Regression. In time-to-event analysis, Random Survival Forest and Survival XGBoost reached the highest C-index (≈0.65). SHapley Additive exPlanations analysis identified LLM-extracted features like Chronic Pain, Liver Disease, and Major Depression as key predictors. We also developed an interactive web tool for real-time clinical use.
Discussion: Features extracted using NLP and LLM-assisted methods improved model accuracy and interpretability, revealing valuable psychosocial risks not captured in structured EHRs.
Conclusion: Combining structured EHR data with LLM-extracted features moderately improves BUP-NAL retention prediction, enabling personalized risk stratification and advancing AI-driven care for substance use disorders.
目的:在我们之前预测阿片类药物使用障碍药物治疗保留的基础上,我们旨在通过结合应用于非结构化临床记录的大型语言模型(LLMs)的特征,提高丁丙诺啡-纳洛酮(BUP-NAL)治疗6个月的保留预测。材料和方法:我们使用来自Stanford health Care (STARR)的去识别电子健康记录(EHR)数据进行模型开发和内部验证,并使用NeuroBlu行为健康数据库进行外部验证。使用临床实体增强检索管道从自由文本笔记中提取13个临床和社会心理特征,补充结构化特征,该管道将命名实体识别与基于llm的分类相结合,以提供上下文解释。我们训练了分类(Logistic Regression, Random Forest, XGBoost)和生存模型(CoxPH, Random survival Forest, survival XGBoost),并使用受试者工作特征曲线下面积(ROC-AUC)和C-index进行评估。结果:XGBoost的分类性能最高(ROC-AUC = 0.65)。合并llm派生的特性可以改善所有架构中的模型性能,在简单的模型(如Logistic Regression)中可以观察到最大的收益。在时间-事件分析中,Random Survival Forest和Survival XGBoost的C-index最高(≈0.65)。SHapley加性解释分析确定了llm提取的特征,如慢性疼痛、肝脏疾病和重度抑郁症是关键的预测因素。我们还开发了一个交互式网络工具,用于实时临床使用。讨论:使用NLP和llm辅助方法提取的特征提高了模型的准确性和可解释性,揭示了结构化电子病历中未捕获的有价值的社会心理风险。结论:将结构化的EHR数据与llm提取的特征相结合,适度改善了BUP-NAL保留预测,实现了个性化的风险分层,并推进了人工智能驱动的物质使用障碍护理。
期刊介绍:
JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.