Nathan Wolfrath, Joel Wolfrath, Hengrui Hu, Anjishnu Banerjee, Anai N. Kothari
{"title":"Stronger Baseline Models -- A Key Requirement for Aligning Machine Learning Research with Clinical Utility","authors":"Nathan Wolfrath, Joel Wolfrath, Hengrui Hu, Anjishnu Banerjee, Anai N. Kothari","doi":"arxiv-2409.12116","DOIUrl":null,"url":null,"abstract":"Machine Learning (ML) research has increased substantially in recent years,\ndue to the success of predictive modeling across diverse application domains.\nHowever, well-known barriers exist when attempting to deploy ML models in\nhigh-stakes, clinical settings, including lack of model transparency (or the\ninability to audit the inference process), large training data requirements\nwith siloed data sources, and complicated metrics for measuring model utility.\nIn this work, we show empirically that including stronger baseline models in\nhealthcare ML evaluations has important downstream effects that aid\npractitioners in addressing these challenges. Through a series of case studies,\nwe find that the common practice of omitting baselines or comparing against a\nweak baseline model (e.g. a linear model with no optimization) obscures the\nvalue of ML methods proposed in the research literature. Using these insights,\nwe propose some best practices that will enable practitioners to more\neffectively study and deploy ML models in clinical settings.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Machine Learning (ML) research has increased substantially in recent years,
due to the success of predictive modeling across diverse application domains.
However, well-known barriers exist when attempting to deploy ML models in
high-stakes, clinical settings, including lack of model transparency (or the
inability to audit the inference process), large training data requirements
with siloed data sources, and complicated metrics for measuring model utility.
In this work, we show empirically that including stronger baseline models in
healthcare ML evaluations has important downstream effects that aid
practitioners in addressing these challenges. Through a series of case studies,
we find that the common practice of omitting baselines or comparing against a
weak baseline model (e.g. a linear model with no optimization) obscures the
value of ML methods proposed in the research literature. Using these insights,
we propose some best practices that will enable practitioners to more
effectively study and deploy ML models in clinical settings.
近年来,由于预测建模在不同应用领域取得了成功,机器学习(ML)研究大幅增加。然而,在高风险的临床环境中尝试部署 ML 模型时存在众所周知的障碍,包括缺乏模型透明度(或无法审计推理过程)、孤立数据源的大量训练数据要求以及衡量模型效用的复杂指标。在这项工作中,我们通过实证研究表明,在医疗保健 ML 评估中纳入更强的基线模型具有重要的下游效应,有助于实践者应对这些挑战。通过一系列案例研究,我们发现省略基线模型或与弱基线模型(如未优化的线性模型)进行比较的常见做法掩盖了研究文献中提出的 ML 方法的价值。利用这些洞察力,我们提出了一些最佳实践,使从业人员能够在临床环境中更有效地研究和部署 ML 模型。