Residual permutation tests for feature importance in machine learning.

IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS
Po-Hsien Huang
{"title":"Residual permutation tests for feature importance in machine learning.","authors":"Po-Hsien Huang","doi":"10.1111/bmsp.70009","DOIUrl":null,"url":null,"abstract":"<p><p>Psychological research has traditionally relied on linear models to test scientific hypotheses. However, the emergence of machine learning (ML) algorithms has opened new opportunities for exploring variable relationships beyond linear constraints. To interpret the outcomes of these 'black-box' algorithms, various tools for assessing feature importance have been developed. However, most of these tools are descriptive and do not facilitate statistical inference. To address this gap, our study introduces two versions of residual permutation tests (RPTs), designed to assess the significance of a target feature in predicting the label. The first variant, RPT on Y (RPT-Y), permutes the residuals of the label conditioned on features other than the target. The second variant, RPT on X (RPT-X), permutes the residuals of the target feature conditioned on the other features. Through a comprehensive simulation study, we show that RPT-X maintains empirical Type I error rates under the nominal level across a wide range of ML algorithms and demonstrates appropriate statistical power in both regression and classification contexts. These findings suggest the utility of RPT-X for hypothesis testing in ML applications.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Mathematical & Statistical Psychology","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1111/bmsp.70009","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Psychological research has traditionally relied on linear models to test scientific hypotheses. However, the emergence of machine learning (ML) algorithms has opened new opportunities for exploring variable relationships beyond linear constraints. To interpret the outcomes of these 'black-box' algorithms, various tools for assessing feature importance have been developed. However, most of these tools are descriptive and do not facilitate statistical inference. To address this gap, our study introduces two versions of residual permutation tests (RPTs), designed to assess the significance of a target feature in predicting the label. The first variant, RPT on Y (RPT-Y), permutes the residuals of the label conditioned on features other than the target. The second variant, RPT on X (RPT-X), permutes the residuals of the target feature conditioned on the other features. Through a comprehensive simulation study, we show that RPT-X maintains empirical Type I error rates under the nominal level across a wide range of ML algorithms and demonstrates appropriate statistical power in both regression and classification contexts. These findings suggest the utility of RPT-X for hypothesis testing in ML applications.

机器学习中特征重要性的残差排列测试。
心理学研究传统上依靠线性模型来检验科学假设。然而,机器学习(ML)算法的出现为探索超越线性约束的变量关系开辟了新的机会。为了解释这些“黑盒”算法的结果,已经开发了各种评估特征重要性的工具。然而,这些工具大多是描述性的,不便于统计推断。为了解决这一差距,我们的研究引入了两个版本的残差排列测试(RPTs),旨在评估目标特征在预测标签中的重要性。第一种变体,RPT on Y (RPT-Y),根据目标以外的特征来排列标签的残差。第二个变体,RPT on X (RPT-X),将目标特征的残差以其他特征为条件进行排列。通过全面的模拟研究,我们表明RPT-X在广泛的ML算法中保持经验I型错误率低于标称水平,并在回归和分类上下文中显示出适当的统计能力。这些发现表明RPT-X在机器学习应用中的假设检验的效用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.00
自引率
3.80%
发文量
34
审稿时长
>12 weeks
期刊介绍: The British Journal of Mathematical and Statistical Psychology publishes articles relating to areas of psychology which have a greater mathematical or statistical aspect of their argument than is usually acceptable to other journals including: • mathematical psychology • statistics • psychometrics • decision making • psychophysics • classification • relevant areas of mathematics, computing and computer software These include articles that address substantitive psychological issues or that develop and extend techniques useful to psychologists. New models for psychological processes, new approaches to existing data, critiques of existing models and improved algorithms for estimating the parameters of a model are examples of articles which may be favoured.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信