Residual permutation tests for feature importance in machine learning.

IF 1.8 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology Pub Date : 2025-08-30 DOI:10.1111/bmsp.70009

Po-Hsien Huang

{"title":"Residual permutation tests for feature importance in machine learning.","authors":"Po-Hsien Huang","doi":"10.1111/bmsp.70009","DOIUrl":null,"url":null,"abstract":"<p><p>Psychological research has traditionally relied on linear models to test scientific hypotheses. However, the emergence of machine learning (ML) algorithms has opened new opportunities for exploring variable relationships beyond linear constraints. To interpret the outcomes of these 'black-box' algorithms, various tools for assessing feature importance have been developed. However, most of these tools are descriptive and do not facilitate statistical inference. To address this gap, our study introduces two versions of residual permutation tests (RPTs), designed to assess the significance of a target feature in predicting the label. The first variant, RPT on Y (RPT-Y), permutes the residuals of the label conditioned on features other than the target. The second variant, RPT on X (RPT-X), permutes the residuals of the target feature conditioned on the other features. Through a comprehensive simulation study, we show that RPT-X maintains empirical Type I error rates under the nominal level across a wide range of ML algorithms and demonstrates appropriate statistical power in both regression and classification contexts. These findings suggest the utility of RPT-X for hypothesis testing in ML applications.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Mathematical & Statistical Psychology","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1111/bmsp.70009","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Psychological research has traditionally relied on linear models to test scientific hypotheses. However, the emergence of machine learning (ML) algorithms has opened new opportunities for exploring variable relationships beyond linear constraints. To interpret the outcomes of these 'black-box' algorithms, various tools for assessing feature importance have been developed. However, most of these tools are descriptive and do not facilitate statistical inference. To address this gap, our study introduces two versions of residual permutation tests (RPTs), designed to assess the significance of a target feature in predicting the label. The first variant, RPT on Y (RPT-Y), permutes the residuals of the label conditioned on features other than the target. The second variant, RPT on X (RPT-X), permutes the residuals of the target feature conditioned on the other features. Through a comprehensive simulation study, we show that RPT-X maintains empirical Type I error rates under the nominal level across a wide range of ML algorithms and demonstrates appropriate statistical power in both regression and classification contexts. These findings suggest the utility of RPT-X for hypothesis testing in ML applications.

查看原文本刊更多论文

机器学习中特征重要性的残差排列测试。

心理学研究传统上依靠线性模型来检验科学假设。然而，机器学习（ML）算法的出现为探索超越线性约束的变量关系开辟了新的机会。为了解释这些“黑盒”算法的结果，已经开发了各种评估特征重要性的工具。然而，这些工具大多是描述性的，不便于统计推断。为了解决这一差距，我们的研究引入了两个版本的残差排列测试（RPTs），旨在评估目标特征在预测标签中的重要性。第一种变体，RPT on Y (RPT-Y)，根据目标以外的特征来排列标签的残差。第二个变体，RPT on X (RPT-X)，将目标特征的残差以其他特征为条件进行排列。通过全面的模拟研究，我们表明RPT-X在广泛的ML算法中保持经验I型错误率低于标称水平，并在回归和分类上下文中显示出适当的统计能力。这些发现表明RPT-X在机器学习应用中的假设检验的效用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

British Journal of Mathematical & Statistical Psychology 医学-数学跨学科应用

CiteScore

5.00

自引率

3.80%

发文量

审稿时长

>12 weeks

期刊介绍： The British Journal of Mathematical and Statistical Psychology publishes articles relating to areas of psychology which have a greater mathematical or statistical aspect of their argument than is usually acceptable to other journals including: • mathematical psychology • statistics • psychometrics • decision making • psychophysics • classification • relevant areas of mathematics, computing and computer software These include articles that address substantitive psychological issues or that develop and extend techniques useful to psychologists. New models for psychological processes, new approaches to existing data, critiques of existing models and improved algorithms for estimating the parameters of a model are examples of articles which may be favoured.