Maximilian Schuessler, Scott Fleming, Shannon Meyer, Tina Seto, Tina Hernandez-Boussard
{"title":"诊断框架,验证临床机器学习模型在本地的时间戳数据。","authors":"Maximilian Schuessler, Scott Fleming, Shannon Meyer, Tina Seto, Tina Hernandez-Boussard","doi":"10.1038/s43856-025-00965-w","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Real-world medical environments such as oncology are highly dynamic due to rapid changes in medical practice, technologies, and patient characteristics. This variability, if not addressed, can result in data shifts with potentially poor model performance. Presently, there are few easy-to-implement, model-agnostic diagnostic frameworks to vet machine learning models for future applicability and temporal consistency.</p><p><strong>Methods: </strong>We extracted clinical data from EHR for a cohort of over 24,000 patients who received antineoplastic therapy within a distinct year. The label of this study are acute care utilization (ACU) events, i.e., emergency department visits and hospitalizations, within 180 days of treatment initiation. Our cross-sectional data spans treatment initiation points from 2010-2022. We implemented three models within our validation framework: Least Absolute Shrinkage and Selection Operator (LASSO), Random Forest (RF), and Extreme Gradient Boosting (XGBoost).</p><p><strong>Results: </strong>Here, we introduce a model-agnostic diagnostic framework to validate clinical machine learning models on time-stamped data, consisting of four stages. First, the framework evaluates performance by partitioning data from multiple years into training and validation cohorts. Second, it characterizes the temporal evolution of patient outcomes and characteristics. Third, model longevity and trade-offs between data quantity and recency are explored. Finally, feature importance and data valuation algorithms are applied for feature reduction and data quality assessment. When applied to predicting ACU in cancer patients, the framework highlights fluctuations in features, labels, and data values over time.</p><p><strong>Conclusions: </strong>The work in this study emphasizes the importance of data timeliness and relevance. The results on ACU in cancer patients show moderate signs of drift and corroborate the relevance of temporal considerations when validating machine learning models for deployment at the point of care.</p>","PeriodicalId":72646,"journal":{"name":"Communications medicine","volume":"5 1","pages":"261"},"PeriodicalIF":5.4000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12219301/pdf/","citationCount":"0","resultStr":"{\"title\":\"Diagnostic framework to validate clinical machine learning models locally on temporally stamped data.\",\"authors\":\"Maximilian Schuessler, Scott Fleming, Shannon Meyer, Tina Seto, Tina Hernandez-Boussard\",\"doi\":\"10.1038/s43856-025-00965-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Real-world medical environments such as oncology are highly dynamic due to rapid changes in medical practice, technologies, and patient characteristics. This variability, if not addressed, can result in data shifts with potentially poor model performance. Presently, there are few easy-to-implement, model-agnostic diagnostic frameworks to vet machine learning models for future applicability and temporal consistency.</p><p><strong>Methods: </strong>We extracted clinical data from EHR for a cohort of over 24,000 patients who received antineoplastic therapy within a distinct year. The label of this study are acute care utilization (ACU) events, i.e., emergency department visits and hospitalizations, within 180 days of treatment initiation. Our cross-sectional data spans treatment initiation points from 2010-2022. We implemented three models within our validation framework: Least Absolute Shrinkage and Selection Operator (LASSO), Random Forest (RF), and Extreme Gradient Boosting (XGBoost).</p><p><strong>Results: </strong>Here, we introduce a model-agnostic diagnostic framework to validate clinical machine learning models on time-stamped data, consisting of four stages. First, the framework evaluates performance by partitioning data from multiple years into training and validation cohorts. Second, it characterizes the temporal evolution of patient outcomes and characteristics. Third, model longevity and trade-offs between data quantity and recency are explored. Finally, feature importance and data valuation algorithms are applied for feature reduction and data quality assessment. When applied to predicting ACU in cancer patients, the framework highlights fluctuations in features, labels, and data values over time.</p><p><strong>Conclusions: </strong>The work in this study emphasizes the importance of data timeliness and relevance. The results on ACU in cancer patients show moderate signs of drift and corroborate the relevance of temporal considerations when validating machine learning models for deployment at the point of care.</p>\",\"PeriodicalId\":72646,\"journal\":{\"name\":\"Communications medicine\",\"volume\":\"5 1\",\"pages\":\"261\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12219301/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communications medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1038/s43856-025-00965-w\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, RESEARCH & EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s43856-025-00965-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
Diagnostic framework to validate clinical machine learning models locally on temporally stamped data.
Background: Real-world medical environments such as oncology are highly dynamic due to rapid changes in medical practice, technologies, and patient characteristics. This variability, if not addressed, can result in data shifts with potentially poor model performance. Presently, there are few easy-to-implement, model-agnostic diagnostic frameworks to vet machine learning models for future applicability and temporal consistency.
Methods: We extracted clinical data from EHR for a cohort of over 24,000 patients who received antineoplastic therapy within a distinct year. The label of this study are acute care utilization (ACU) events, i.e., emergency department visits and hospitalizations, within 180 days of treatment initiation. Our cross-sectional data spans treatment initiation points from 2010-2022. We implemented three models within our validation framework: Least Absolute Shrinkage and Selection Operator (LASSO), Random Forest (RF), and Extreme Gradient Boosting (XGBoost).
Results: Here, we introduce a model-agnostic diagnostic framework to validate clinical machine learning models on time-stamped data, consisting of four stages. First, the framework evaluates performance by partitioning data from multiple years into training and validation cohorts. Second, it characterizes the temporal evolution of patient outcomes and characteristics. Third, model longevity and trade-offs between data quantity and recency are explored. Finally, feature importance and data valuation algorithms are applied for feature reduction and data quality assessment. When applied to predicting ACU in cancer patients, the framework highlights fluctuations in features, labels, and data values over time.
Conclusions: The work in this study emphasizes the importance of data timeliness and relevance. The results on ACU in cancer patients show moderate signs of drift and corroborate the relevance of temporal considerations when validating machine learning models for deployment at the point of care.