Beyond the ‘black box’: choosing interpretable machine learning models for predicting postoperative opioid trends

IF 7.5 1区 医学 Q1 ANESTHESIOLOGY
Anaesthesia Pub Date : 2025-02-02 DOI:10.1111/anae.16553
Seshadri C. Mudumbai, James Baurley, Caitlin E. Coombes, Randall S. Stafford, Edward R. Mariano
{"title":"Beyond the ‘black box’: choosing interpretable machine learning models for predicting postoperative opioid trends","authors":"Seshadri C. Mudumbai, James Baurley, Caitlin E. Coombes, Randall S. Stafford, Edward R. Mariano","doi":"10.1111/anae.16553","DOIUrl":null,"url":null,"abstract":"<p>Artificial intelligence encompasses machine learning and is a popular, yet controversial, topic in healthcare. Recent guidelines from national regulatory agencies underscore the critical importance of interpretability in machine learning models used in healthcare [<span>1</span>]. ‘Interpretability’ means that clinicians understand the reasoning behind a model's predictions, fostering trust and enabling informed clinical decision-making [Doshi-Velez et al. preprint, https://arxiv.org/abs/1702.08608]. In response to the opioid epidemic, there has been interest in using machine learning models to predict which patients will have the highest risk of postoperative opioid dependence. To be interpretable, clinicians should be able to see which specific factors (e.g. previous opioid use, type of surgery or mental health conditions) contribute to prediction. Experts have advocated for building inherently interpretable models from the start, especially in high-stakes medical contexts, rather than retrofitting explanations onto complex models after development [<span>2</span>]. As machine learning algorithms become integral to peri-operative management, balancing model complexity with interpretability is crucial [<span>3</span>]. The objective of this study was to evaluate whether simpler, more interpretable models could match complex ones in predictive accuracy and in identifying key predictors for postoperative opioid use.</p>\n<p>Following institutional review board approval, we conducted a retrospective cohort study at a US Veterans Affairs hospital. We included adult patients who had surgery from 2015 to 2021 and had documented pre-operative and post-discharge opioid prescriptions. Patients without complete opioid prescription data were not studied.</p>\n<p>Baseline data were extracted from electronic health records and included: patient characteristics; clinical variables (such as type of surgery and duration of hospital stay); and mental health diagnoses. We assessed three outcomes, with mean daily morphine milligram equivalents (MME) as the primary outcome and variance in MME and monthly rate of change in MME as secondary outcomes; these were all measured over 12 months before surgery and post-discharge. Opioid prescriptions were converted to MME, and mental health diagnoses were identified using ICD-10 revision codes as described in previous studies [<span>4</span>].</p>\n<p>We developed three machine learning models to predict postoperative opioid use: lasso regression, which enhances accuracy and interpretability through variable selection and regularisation; decision tree, which predicts outcomes using interpretable decision rules inferred from data; and extreme gradient boosting (XGBoost), an ensemble method known for high predictive performance but lower interpretability [<span>5</span>].</p>\n<p>Analyses were performed using RStudio (version 12.0, R Foundation for Statistical Computing, Vienna, Austria) involving two scenarios: models were trained using only baseline predictors without pre-operative opioid use data; and models included all baseline predictors plus pre-operative opioid use metrics. We utilised the interpretable machine learning package for feature importance analysis, with the rpart and XGBoost packages used for model implementation. Hyperparameters were optimised via grid search and cross-validation. Ten-fold cross-validation minimised overfitting and assessed generalisability. The primary evaluation metric was root mean squared error (RMSE) and mean absolute error (MAE) was also calculated. Feature importance was determined by coefficient magnitude (lasso regression), tree structure (decision tree) and built-in importance measures (XGBoost), with p &lt; 0.05 defined as statistically significant.</p>\n<p>The study cohort consisted of 1396 patients who were predominantly male (93.6%), aged &gt; 70 y (49.4%) and White (77.4%) (online Supporting Information Table S1). Half of the cohort had a diagnosed mental illness, with major depression (58.2%) and substance use disorder (27.9%) being most prevalent. Surgery type varied, with orthopaedics (19.3%) and ophthalmology (9.3%) being common. The mean (SD) pre-operative MME was 681 (1340), indicating significant opioid use before surgery.</p>\n<p>Including pre-operative opioid metrics enhanced predictive accuracy across all models (Table 1). Lasso regression showed the greatest improvement (RMSE 1263 to 711, MAE 726 to 350, p &lt; 0.01), followed by decision tree (RMSE 1286 to 787, MAE 709 to 363, p &lt; 0.01), while XGBoost showed modest improvements (RMSE 1352 to 1168, MAE 600 to 528, p &lt; 0.05). For secondary outcomes (online Supporting Information Table S2), models showed modest improvements in predicting opioid use variance, with XGBoost performing best (RMSE 2,540,888 to 2,299,404), while improvements in predicting monthly rate of change were minimal.</p>\n<div>\n<header><span>Table 1. </span>Comparison of machine learning model performance in predicting post-discharge mean opioid use with and without pre-operative opioid data. Models were trained on two sets of predictors: baseline predictors only (e.g. demographics, surgery type, duration of hospital stay and mental health diagnoses); and baseline plus pre-operative opioid metrics (12-month pre-operative opioid usage). Lower values indicate better predictive performance.</header>\n<div tabindex=\"0\">\n<table>\n<thead>\n<tr>\n<th rowspan=\"2\">Model</th>\n<th rowspan=\"2\">Metric</th>\n<th>Baseline predictors</th>\n<th>+ Pre-operative opioid metrics</th>\n</tr>\n<tr>\n<th colspan=\"2\" style=\"top: 41px;\">Mean MME prediction</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td rowspan=\"2\">Lasso regression</td>\n<td>RMSE</td>\n<td>1263</td>\n<td>711</td>\n</tr>\n<tr>\n<td>MAE</td>\n<td>726</td>\n<td>350</td>\n</tr>\n<tr>\n<td rowspan=\"2\">Decision tree</td>\n<td>RMSE</td>\n<td>1286</td>\n<td>787</td>\n</tr>\n<tr>\n<td>MAE</td>\n<td>709</td>\n<td>363</td>\n</tr>\n<tr>\n<td rowspan=\"2\">XGBoost</td>\n<td>RMSE</td>\n<td>1352</td>\n<td>1168</td>\n</tr>\n<tr>\n<td>MAE</td>\n<td>600</td>\n<td>528</td>\n</tr>\n</tbody>\n</table>\n</div>\n<div>\n<ul>\n<li> MME, morphine milligram equivalent; RMSE, root mean squared error; MAE, mean absolute error. </li>\n</ul>\n</div>\n<div></div>\n</div>\n<p>Feature importance analysis revealed differences among models (online Supporting Information Figure S1). While XGBoost heavily weighted pre-operative mean MME, emphasising reliance on previous opioid use patterns, decision tree and lasso regression identified additional important predictors. Decision tree highlighted surgical type and duration of hospital stay alongside pre-operative opioid metrics, while lasso regression emphasised mental health diagnoses and duration of hospital stay as influential predictors.</p>\n<p>Our study shows that simpler models can predict postoperative opioid trends effectively and provide valuable insights into key predictors [<span>6</span>]. Notably, lasso regression and decision tree models identified clinically relevant factors beyond opioid use history, achieving comparable accuracy while offering greater potential interpretability. The lack of transparency in complex models may limit clinical adoption and need further evaluation.</p>\n<p>The Veterans Affairs healthcare population is known to have higher prevalence rates of mental illness, pre-operative opioid use and substance use disorders compared with typical surgical populations in the USA. Recent studies of general surgical populations in the USA report pre-operative opioid use rates of 10–30% and mental health diagnosis rates of 10–35%, compared with rates in our cohort of 78% and &gt; 50%, respectively [<span>6-8</span>]. The 94.1% prevalence of chronic pain in our cohort is also notably higher than in general surgical populations (typically 25–40%). These characteristics of US veterans, particularly among those seeking surgical care at VA healthcare facilities, are important to note [<span>4, 7</span>]. While these features may limit the generalisability of any inferences, from the perspective of our study purpose, higher prevalence rates contribute to a richer dataset for evaluating and comparing the ability of these models to identify complex predictor relationships. Based on our results, prioritising simpler models and interpretability may enhance clinical utility without compromising performance [<span>9</span>]. Multicentre evaluation involving more diverse surgical populations will be necessary to validate these findings and assess model interpretability needs across different clinical settings.</p>","PeriodicalId":7742,"journal":{"name":"Anaesthesia","volume":"168 1","pages":""},"PeriodicalIF":7.5000,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anaesthesia","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/anae.16553","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ANESTHESIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Artificial intelligence encompasses machine learning and is a popular, yet controversial, topic in healthcare. Recent guidelines from national regulatory agencies underscore the critical importance of interpretability in machine learning models used in healthcare [1]. ‘Interpretability’ means that clinicians understand the reasoning behind a model's predictions, fostering trust and enabling informed clinical decision-making [Doshi-Velez et al. preprint, https://arxiv.org/abs/1702.08608]. In response to the opioid epidemic, there has been interest in using machine learning models to predict which patients will have the highest risk of postoperative opioid dependence. To be interpretable, clinicians should be able to see which specific factors (e.g. previous opioid use, type of surgery or mental health conditions) contribute to prediction. Experts have advocated for building inherently interpretable models from the start, especially in high-stakes medical contexts, rather than retrofitting explanations onto complex models after development [2]. As machine learning algorithms become integral to peri-operative management, balancing model complexity with interpretability is crucial [3]. The objective of this study was to evaluate whether simpler, more interpretable models could match complex ones in predictive accuracy and in identifying key predictors for postoperative opioid use.

Following institutional review board approval, we conducted a retrospective cohort study at a US Veterans Affairs hospital. We included adult patients who had surgery from 2015 to 2021 and had documented pre-operative and post-discharge opioid prescriptions. Patients without complete opioid prescription data were not studied.

Baseline data were extracted from electronic health records and included: patient characteristics; clinical variables (such as type of surgery and duration of hospital stay); and mental health diagnoses. We assessed three outcomes, with mean daily morphine milligram equivalents (MME) as the primary outcome and variance in MME and monthly rate of change in MME as secondary outcomes; these were all measured over 12 months before surgery and post-discharge. Opioid prescriptions were converted to MME, and mental health diagnoses were identified using ICD-10 revision codes as described in previous studies [4].

We developed three machine learning models to predict postoperative opioid use: lasso regression, which enhances accuracy and interpretability through variable selection and regularisation; decision tree, which predicts outcomes using interpretable decision rules inferred from data; and extreme gradient boosting (XGBoost), an ensemble method known for high predictive performance but lower interpretability [5].

Analyses were performed using RStudio (version 12.0, R Foundation for Statistical Computing, Vienna, Austria) involving two scenarios: models were trained using only baseline predictors without pre-operative opioid use data; and models included all baseline predictors plus pre-operative opioid use metrics. We utilised the interpretable machine learning package for feature importance analysis, with the rpart and XGBoost packages used for model implementation. Hyperparameters were optimised via grid search and cross-validation. Ten-fold cross-validation minimised overfitting and assessed generalisability. The primary evaluation metric was root mean squared error (RMSE) and mean absolute error (MAE) was also calculated. Feature importance was determined by coefficient magnitude (lasso regression), tree structure (decision tree) and built-in importance measures (XGBoost), with p < 0.05 defined as statistically significant.

The study cohort consisted of 1396 patients who were predominantly male (93.6%), aged > 70 y (49.4%) and White (77.4%) (online Supporting Information Table S1). Half of the cohort had a diagnosed mental illness, with major depression (58.2%) and substance use disorder (27.9%) being most prevalent. Surgery type varied, with orthopaedics (19.3%) and ophthalmology (9.3%) being common. The mean (SD) pre-operative MME was 681 (1340), indicating significant opioid use before surgery.

Including pre-operative opioid metrics enhanced predictive accuracy across all models (Table 1). Lasso regression showed the greatest improvement (RMSE 1263 to 711, MAE 726 to 350, p < 0.01), followed by decision tree (RMSE 1286 to 787, MAE 709 to 363, p < 0.01), while XGBoost showed modest improvements (RMSE 1352 to 1168, MAE 600 to 528, p < 0.05). For secondary outcomes (online Supporting Information Table S2), models showed modest improvements in predicting opioid use variance, with XGBoost performing best (RMSE 2,540,888 to 2,299,404), while improvements in predicting monthly rate of change were minimal.

Table 1. Comparison of machine learning model performance in predicting post-discharge mean opioid use with and without pre-operative opioid data. Models were trained on two sets of predictors: baseline predictors only (e.g. demographics, surgery type, duration of hospital stay and mental health diagnoses); and baseline plus pre-operative opioid metrics (12-month pre-operative opioid usage). Lower values indicate better predictive performance.
Model Metric Baseline predictors + Pre-operative opioid metrics
Mean MME prediction
Lasso regression RMSE 1263 711
MAE 726 350
Decision tree RMSE 1286 787
MAE 709 363
XGBoost RMSE 1352 1168
MAE 600 528
  • MME, morphine milligram equivalent; RMSE, root mean squared error; MAE, mean absolute error.

Feature importance analysis revealed differences among models (online Supporting Information Figure S1). While XGBoost heavily weighted pre-operative mean MME, emphasising reliance on previous opioid use patterns, decision tree and lasso regression identified additional important predictors. Decision tree highlighted surgical type and duration of hospital stay alongside pre-operative opioid metrics, while lasso regression emphasised mental health diagnoses and duration of hospital stay as influential predictors.

Our study shows that simpler models can predict postoperative opioid trends effectively and provide valuable insights into key predictors [6]. Notably, lasso regression and decision tree models identified clinically relevant factors beyond opioid use history, achieving comparable accuracy while offering greater potential interpretability. The lack of transparency in complex models may limit clinical adoption and need further evaluation.

The Veterans Affairs healthcare population is known to have higher prevalence rates of mental illness, pre-operative opioid use and substance use disorders compared with typical surgical populations in the USA. Recent studies of general surgical populations in the USA report pre-operative opioid use rates of 10–30% and mental health diagnosis rates of 10–35%, compared with rates in our cohort of 78% and > 50%, respectively [6-8]. The 94.1% prevalence of chronic pain in our cohort is also notably higher than in general surgical populations (typically 25–40%). These characteristics of US veterans, particularly among those seeking surgical care at VA healthcare facilities, are important to note [4, 7]. While these features may limit the generalisability of any inferences, from the perspective of our study purpose, higher prevalence rates contribute to a richer dataset for evaluating and comparing the ability of these models to identify complex predictor relationships. Based on our results, prioritising simpler models and interpretability may enhance clinical utility without compromising performance [9]. Multicentre evaluation involving more diverse surgical populations will be necessary to validate these findings and assess model interpretability needs across different clinical settings.

求助全文
约1分钟内获得全文 求助全文
来源期刊
Anaesthesia
Anaesthesia 医学-麻醉学
CiteScore
21.20
自引率
9.30%
发文量
300
审稿时长
6 months
期刊介绍: The official journal of the Association of Anaesthetists is Anaesthesia. It is a comprehensive international publication that covers a wide range of topics. The journal focuses on general and regional anaesthesia, as well as intensive care and pain therapy. It includes original articles that have undergone peer review, covering all aspects of these fields, including research on equipment.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信