Evaluation of Feature Selection Methods for Preserving Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine.

IF 1.8 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Joshua Lemmon, Lin Lawrence Guo, Jose Posada, Stephen R Pfohl, Jason Fries, Scott Lanyon Fleming, Catherine Aftandilian, Nigam Shah, Lillian Sung
{"title":"Evaluation of Feature Selection Methods for Preserving Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine.","authors":"Joshua Lemmon,&nbsp;Lin Lawrence Guo,&nbsp;Jose Posada,&nbsp;Stephen R Pfohl,&nbsp;Jason Fries,&nbsp;Scott Lanyon Fleming,&nbsp;Catherine Aftandilian,&nbsp;Nigam Shah,&nbsp;Lillian Sung","doi":"10.1055/s-0043-1762904","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Temporal dataset shift can cause degradation in model performance as discrepancies between training and deployment data grow over time. The primary objective was to determine whether parsimonious models produced by specific feature selection methods are more robust to temporal dataset shift as measured by out-of-distribution (OOD) performance, while maintaining in-distribution (ID) performance.</p><p><strong>Methods: </strong>Our dataset consisted of intensive care unit patients from MIMIC-IV categorized by year groups (2008-2010, 2011-2013, 2014-2016, and 2017-2019). We trained baseline models using L2-regularized logistic regression on 2008-2010 to predict in-hospital mortality, long length of stay (LOS), sepsis, and invasive ventilation in all year groups. We evaluated three feature selection methods: L1-regularized logistic regression (L1), Remove and Retrain (ROAR), and causal feature selection. We assessed whether a feature selection method could maintain ID performance (2008-2010) and improve OOD performance (2017-2019). We also assessed whether parsimonious models retrained on OOD data performed as well as oracle models trained on all features in the OOD year group.</p><p><strong>Results: </strong>The baseline model showed significantly worse OOD performance with the long LOS and sepsis tasks when compared with the ID performance. L1 and ROAR retained 3.7 to 12.6% of all features, whereas causal feature selection generally retained fewer features. Models produced by L1 and ROAR exhibited similar ID and OOD performance as the baseline models. The retraining of these models on 2017-2019 data using features selected from training on 2008-2010 data generally reached parity with oracle models trained directly on 2017-2019 data using all available features. Causal feature selection led to heterogeneous results with the superset maintaining ID performance while improving OOD calibration only on the long LOS task.</p><p><strong>Conclusions: </strong>While model retraining can mitigate the impact of temporal dataset shift on parsimonious models produced by L1 and ROAR, new methods are required to proactively improve temporal robustness.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 1-02","pages":"60-70"},"PeriodicalIF":1.8000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods of Information in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/s-0043-1762904","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Temporal dataset shift can cause degradation in model performance as discrepancies between training and deployment data grow over time. The primary objective was to determine whether parsimonious models produced by specific feature selection methods are more robust to temporal dataset shift as measured by out-of-distribution (OOD) performance, while maintaining in-distribution (ID) performance.

Methods: Our dataset consisted of intensive care unit patients from MIMIC-IV categorized by year groups (2008-2010, 2011-2013, 2014-2016, and 2017-2019). We trained baseline models using L2-regularized logistic regression on 2008-2010 to predict in-hospital mortality, long length of stay (LOS), sepsis, and invasive ventilation in all year groups. We evaluated three feature selection methods: L1-regularized logistic regression (L1), Remove and Retrain (ROAR), and causal feature selection. We assessed whether a feature selection method could maintain ID performance (2008-2010) and improve OOD performance (2017-2019). We also assessed whether parsimonious models retrained on OOD data performed as well as oracle models trained on all features in the OOD year group.

Results: The baseline model showed significantly worse OOD performance with the long LOS and sepsis tasks when compared with the ID performance. L1 and ROAR retained 3.7 to 12.6% of all features, whereas causal feature selection generally retained fewer features. Models produced by L1 and ROAR exhibited similar ID and OOD performance as the baseline models. The retraining of these models on 2017-2019 data using features selected from training on 2008-2010 data generally reached parity with oracle models trained directly on 2017-2019 data using all available features. Causal feature selection led to heterogeneous results with the superset maintaining ID performance while improving OOD calibration only on the long LOS task.

Conclusions: While model retraining can mitigate the impact of temporal dataset shift on parsimonious models produced by L1 and ROAR, new methods are required to proactively improve temporal robustness.

临床医学中存在时间数据转移时保持机器学习性能的特征选择方法评价。
背景:随着时间的推移,训练数据和部署数据之间的差异越来越大,时间数据转移会导致模型性能下降。主要目的是确定由特定特征选择方法产生的简约模型是否对分布外(OOD)性能测量的时间数据集移动更具鲁棒性,同时保持分布内(ID)性能。方法:我们的数据集包括MIMIC-IV重症监护病房患者,按年份组(2008-2010年、2011-2013年、2014-2016年和2017-2019年)进行分类。我们在2008-2010年使用l2正则化逻辑回归训练基线模型来预测所有年份组的住院死亡率、住院时间(LOS)、败血症和有创通气。我们评估了三种特征选择方法:L1正则化逻辑回归(L1)、去除和再训练(ROAR)和因果特征选择。我们评估了特征选择方法是否可以保持ID性能(2008-2010)并提高OOD性能(2017-2019)。我们还评估了在OOD数据上重新训练的简约模型是否与在OOD年份组的所有特征上训练的oracle模型一样好。结果:基线模型显示,与ID性能相比,长LOS和脓毒症任务的OOD性能明显较差。L1和ROAR保留了所有特征的3.7 - 12.6%,而因果特征选择通常保留较少的特征。L1和ROAR生成的模型具有与基线模型相似的ID和OOD性能。使用从2008-2010年数据训练中选择的特征在2017-2019年数据上对这些模型进行再训练,通常与使用所有可用特征直接在2017-2019年数据上训练的oracle模型相当。因果特征选择导致异构结果,超集保持ID性能,而仅在长LOS任务上改进OOD校准。结论:虽然模型再训练可以减轻时间数据转移对L1和ROAR生成的简约模型的影响,但需要新的方法来主动提高时间鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Methods of Information in Medicine
Methods of Information in Medicine 医学-计算机:信息系统
CiteScore
3.70
自引率
11.80%
发文量
33
审稿时长
6-12 weeks
期刊介绍: Good medicine and good healthcare demand good information. Since the journal''s founding in 1962, Methods of Information in Medicine has stressed the methodology and scientific fundamentals of organizing, representing and analyzing data, information and knowledge in biomedicine and health care. Covering publications in the fields of biomedical and health informatics, medical biometry, and epidemiology, the journal publishes original papers, reviews, reports, opinion papers, editorials, and letters to the editor. From time to time, the journal publishes articles on particular focus themes as part of a journal''s issue.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信