Diagnostic framework to validate clinical machine learning models locally on temporally stamped data.

IF 5.4 Q1 MEDICINE, RESEARCH & EXPERIMENTAL

Communications medicine Pub Date : 2025-07-01 DOI:10.1038/s43856-025-00965-w

Maximilian Schuessler, Scott Fleming, Shannon Meyer, Tina Seto, Tina Hernandez-Boussard

{"title":"Diagnostic framework to validate clinical machine learning models locally on temporally stamped data.","authors":"Maximilian Schuessler, Scott Fleming, Shannon Meyer, Tina Seto, Tina Hernandez-Boussard","doi":"10.1038/s43856-025-00965-w","DOIUrl":null,"url":null,"abstract":"Background: Real-world medical environments such as oncology are highly dynamic due to rapid changes in medical practice, technologies, and patient characteristics. This variability, if not addressed, can result in data shifts with potentially poor model performance. Presently, there are few easy-to-implement, model-agnostic diagnostic frameworks to vet machine learning models for future applicability and temporal consistency.Methods: We extracted clinical data from EHR for a cohort of over 24,000 patients who received antineoplastic therapy within a distinct year. The label of this study are acute care utilization (ACU) events, i.e., emergency department visits and hospitalizations, within 180 days of treatment initiation. Our cross-sectional data spans treatment initiation points from 2010-2022. We implemented three models within our validation framework: Least Absolute Shrinkage and Selection Operator (LASSO), Random Forest (RF), and Extreme Gradient Boosting (XGBoost).Results: Here, we introduce a model-agnostic diagnostic framework to validate clinical machine learning models on time-stamped data, consisting of four stages. First, the framework evaluates performance by partitioning data from multiple years into training and validation cohorts. Second, it characterizes the temporal evolution of patient outcomes and characteristics. Third, model longevity and trade-offs between data quantity and recency are explored. Finally, feature importance and data valuation algorithms are applied for feature reduction and data quality assessment. When applied to predicting ACU in cancer patients, the framework highlights fluctuations in features, labels, and data values over time.Conclusions: The work in this study emphasizes the importance of data timeliness and relevance. The results on ACU in cancer patients show moderate signs of drift and corroborate the relevance of temporal considerations when validating machine learning models for deployment at the point of care.","PeriodicalId":72646,"journal":{"name":"Communications medicine","volume":"5 1","pages":"261"},"PeriodicalIF":5.4000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12219301/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s43856-025-00965-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Real-world medical environments such as oncology are highly dynamic due to rapid changes in medical practice, technologies, and patient characteristics. This variability, if not addressed, can result in data shifts with potentially poor model performance. Presently, there are few easy-to-implement, model-agnostic diagnostic frameworks to vet machine learning models for future applicability and temporal consistency.

Methods: We extracted clinical data from EHR for a cohort of over 24,000 patients who received antineoplastic therapy within a distinct year. The label of this study are acute care utilization (ACU) events, i.e., emergency department visits and hospitalizations, within 180 days of treatment initiation. Our cross-sectional data spans treatment initiation points from 2010-2022. We implemented three models within our validation framework: Least Absolute Shrinkage and Selection Operator (LASSO), Random Forest (RF), and Extreme Gradient Boosting (XGBoost).

Results: Here, we introduce a model-agnostic diagnostic framework to validate clinical machine learning models on time-stamped data, consisting of four stages. First, the framework evaluates performance by partitioning data from multiple years into training and validation cohorts. Second, it characterizes the temporal evolution of patient outcomes and characteristics. Third, model longevity and trade-offs between data quantity and recency are explored. Finally, feature importance and data valuation algorithms are applied for feature reduction and data quality assessment. When applied to predicting ACU in cancer patients, the framework highlights fluctuations in features, labels, and data values over time.

Conclusions: The work in this study emphasizes the importance of data timeliness and relevance. The results on ACU in cancer patients show moderate signs of drift and corroborate the relevance of temporal considerations when validating machine learning models for deployment at the point of care.

查看原文本刊更多论文

诊断框架，验证临床机器学习模型在本地的时间戳数据。

背景：由于医疗实践、技术和患者特征的快速变化，现实世界的医疗环境（如肿瘤学）是高度动态的。这种可变性，如果不加以处理，可能会导致数据移位，并可能导致模型性能下降。目前，很少有易于实现的、与模型无关的诊断框架来审查机器学习模型的未来适用性和时间一致性。方法：我们从电子病历中提取了在不同年份接受抗肿瘤治疗的24000多名患者的临床数据。本研究的标签是急性护理利用（ACU）事件，即急诊就诊和住院，在治疗开始后180天内。我们的横断面数据涵盖了2010-2022年的治疗起始点。我们在验证框架中实现了三个模型：最小绝对收缩和选择算子（LASSO）、随机森林（RF）和极端梯度增强（XGBoost）。在这里，我们引入了一个模型不可知的诊断框架来验证临床机器学习模型在时间戳数据上的有效性，包括四个阶段。首先，该框架通过将多年的数据划分为训练组和验证组来评估性能。其次，它描述了患者结果和特征的时间演变。第三，探讨了模型寿命和数据量与近代性之间的权衡。最后，应用特征重要性和数据评估算法进行特征约简和数据质量评估。当应用于预测癌症患者的ACU时，该框架突出了特征、标签和数据值随时间的波动。结论：本研究强调了数据时效性和相关性的重要性。癌症患者的ACU结果显示出中度漂移的迹象，并证实了在验证机器学习模型在护理点部署时时间考虑的相关性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Communications medicine

自引率

0.00%

发文量