Beyond predictive accuracy: Statistical validation of feature importance in biomedical machine learning

IF 4.8 2区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computer methods and programs in biomedicine Pub Date : 2025-09-24 DOI:10.1016/j.cmpb.2025.109085

Souichi Oka , Nobuko Inoue , Yoshiyasu Takefuji

{"title":"Beyond predictive accuracy: Statistical validation of feature importance in biomedical machine learning","authors":"Souichi Oka , Nobuko Inoue , Yoshiyasu Takefuji","doi":"10.1016/j.cmpb.2025.109085","DOIUrl":null,"url":null,"abstract":"<div><div>In medical machine learning (ML), a fundamental methodological distinction exists between optimizing model performance for predictive tasks and pursuing causal inference for mechanistic interpretation. Achieving high predictive accuracy does not necessarily imply that a model can uncover the true physiological mechanisms underlying the data. This letter addresses a critical interpretational challenge in medical machine learning, building upon Yuyang Yan et al.’s valuable work on exacerbation classification in asthma and COPD. While their multi-feature fusion model, particularly comprising models such as K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Random Forest (RF), and Bidirectional Long Short-Term Memory (BiLSTM) demonstrates high predictive accuracy for respiratory exacerbations, we highlight that such performance alone does not guarantee reliable insights into feature importance. Complex tree-based models like RF, when interpreted via methods like SHapley Additive exPlanations (SHAP), can exhibit inherent biases, overemphasizing features used in early splits and reflecting what is important for their specific prediction rather than the true underlying physiological drivers. Validating feature importance remains challenging without ground truth, as different models often yield varying rankings. We argue that solely relying on model-dependent interpretations risks misrepresenting the actual mechanisms of complex medical phenomena. Therefore, we advocate for a robust analytical strategy that transcends mere predictive metrics. This involves a synergistic approach combining the predictive power of ML with impartial, complementary statistical methodologies—such as non-parametric correlation and mutual information—to ensure genuinely trustworthy scientific insights into the true drivers of respiratory exacerbations.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"272 ","pages":"Article 109085"},"PeriodicalIF":4.8000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725005024","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

In medical machine learning (ML), a fundamental methodological distinction exists between optimizing model performance for predictive tasks and pursuing causal inference for mechanistic interpretation. Achieving high predictive accuracy does not necessarily imply that a model can uncover the true physiological mechanisms underlying the data. This letter addresses a critical interpretational challenge in medical machine learning, building upon Yuyang Yan et al.’s valuable work on exacerbation classification in asthma and COPD. While their multi-feature fusion model, particularly comprising models such as K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Random Forest (RF), and Bidirectional Long Short-Term Memory (BiLSTM) demonstrates high predictive accuracy for respiratory exacerbations, we highlight that such performance alone does not guarantee reliable insights into feature importance. Complex tree-based models like RF, when interpreted via methods like SHapley Additive exPlanations (SHAP), can exhibit inherent biases, overemphasizing features used in early splits and reflecting what is important for their specific prediction rather than the true underlying physiological drivers. Validating feature importance remains challenging without ground truth, as different models often yield varying rankings. We argue that solely relying on model-dependent interpretations risks misrepresenting the actual mechanisms of complex medical phenomena. Therefore, we advocate for a robust analytical strategy that transcends mere predictive metrics. This involves a synergistic approach combining the predictive power of ML with impartial, complementary statistical methodologies—such as non-parametric correlation and mutual information—to ensure genuinely trustworthy scientific insights into the true drivers of respiratory exacerbations.

查看原文本刊更多论文

超越预测准确性：生物医学机器学习中特征重要性的统计验证

在医疗机器学习（ML）中，在优化预测任务的模型性能和追求机械解释的因果推理之间存在基本的方法区别。实现高预测准确性并不一定意味着一个模型可以揭示数据背后的真正生理机制。这封信解决了医疗机器学习中一个关键的解释性挑战，建立在闫玉阳等人关于哮喘和COPD加重分类的有价值的工作基础上。虽然他们的多特征融合模型，特别是包括k近邻（KNN）、支持向量机（SVM）、随机森林（RF）和双向长短期记忆（BiLSTM）等模型，对呼吸恶化的预测精度很高，但我们强调，仅靠这种性能并不能保证对特征重要性的可靠洞察。像RF这样复杂的基于树的模型，当通过SHapley加性解释（SHAP）等方法解释时，可能会表现出固有的偏差，过度强调早期分裂中使用的特征，反映的是对其特定预测重要的东西，而不是真正的潜在生理驱动因素。由于不同的模型通常会产生不同的排名，因此在没有基础事实的情况下验证特性的重要性仍然具有挑战性。我们认为，仅仅依赖于模型依赖的解释有可能歪曲复杂医学现象的实际机制。因此，我们提倡一种超越单纯预测指标的稳健分析策略。这涉及到一种将机器学习的预测能力与公正、互补的统计方法（如非参数相关性和相互信息）相结合的协同方法，以确保对呼吸恶化的真正驱动因素的真正可信的科学见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer methods and programs in biomedicine 工程技术-工程：生物医学

CiteScore

12.30

自引率

6.60%

发文量

601

审稿时长

135 days

期刊介绍： To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.