迫切需要在机器学习中可靠的特征重要性分析中使用稳健的统计方法

IF 6.5 1区 化学 Q2 CHEMISTRY, PHYSICAL
Yoshiyasu Takefuji
{"title":"迫切需要在机器学习中可靠的特征重要性分析中使用稳健的统计方法","authors":"Yoshiyasu Takefuji","doi":"10.1016/j.jcat.2025.116098","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate analytical outcomes in machine learning are contingent on error-free calculations and a solid understanding of foundational principles. A notable challenge arises from the lack of ground truth values for validation, complicating the assessment of feature importance, especially when employing linear models with parametric assumptions. This paper critiques the use of Pearson correlation and feature importances derived from Gradient Boosting Regressor (GBR), emphasizing their limitations in analyzing nonlinear and nonparametric data. We propose robust statistical methods, such as Spearman’s correlation and Kendall’s tau, as alternatives for capturing complex relationships while providing essential directional information. Additionally, attention to Variance Inflation Factor (VIF) is crucial for mitigating feature inflation. By addressing these concerns, researchers can achieve more reliable analyses and deeper insight into variable relationships.</div></div>","PeriodicalId":346,"journal":{"name":"Journal of Catalysis","volume":"446 ","pages":"Article 116098"},"PeriodicalIF":6.5000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An urgent call for robust statistical methods in reliable feature importance analysis across machine learning\",\"authors\":\"Yoshiyasu Takefuji\",\"doi\":\"10.1016/j.jcat.2025.116098\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate analytical outcomes in machine learning are contingent on error-free calculations and a solid understanding of foundational principles. A notable challenge arises from the lack of ground truth values for validation, complicating the assessment of feature importance, especially when employing linear models with parametric assumptions. This paper critiques the use of Pearson correlation and feature importances derived from Gradient Boosting Regressor (GBR), emphasizing their limitations in analyzing nonlinear and nonparametric data. We propose robust statistical methods, such as Spearman’s correlation and Kendall’s tau, as alternatives for capturing complex relationships while providing essential directional information. Additionally, attention to Variance Inflation Factor (VIF) is crucial for mitigating feature inflation. By addressing these concerns, researchers can achieve more reliable analyses and deeper insight into variable relationships.</div></div>\",\"PeriodicalId\":346,\"journal\":{\"name\":\"Journal of Catalysis\",\"volume\":\"446 \",\"pages\":\"Article 116098\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Catalysis\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0021951725001630\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Catalysis","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0021951725001630","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

摘要

在机器学习中,准确的分析结果取决于无错误的计算和对基本原理的深刻理解。一个值得注意的挑战来自缺乏验证的基础真值,使特征重要性的评估复杂化,特别是在使用带有参数假设的线性模型时。本文批评了Pearson相关性和梯度增强回归(GBR)的特征重要性的使用,强调了它们在分析非线性和非参数数据时的局限性。我们提出了稳健的统计方法,如Spearman的相关性和Kendall的tau,作为捕获复杂关系的替代方法,同时提供必要的方向信息。此外,注意方差膨胀因子(VIF)对于减轻功能膨胀至关重要。通过解决这些问题,研究人员可以获得更可靠的分析和更深入的洞察变量关系。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

An urgent call for robust statistical methods in reliable feature importance analysis across machine learning

An urgent call for robust statistical methods in reliable feature importance analysis across machine learning

An urgent call for robust statistical methods in reliable feature importance analysis across machine learning
Accurate analytical outcomes in machine learning are contingent on error-free calculations and a solid understanding of foundational principles. A notable challenge arises from the lack of ground truth values for validation, complicating the assessment of feature importance, especially when employing linear models with parametric assumptions. This paper critiques the use of Pearson correlation and feature importances derived from Gradient Boosting Regressor (GBR), emphasizing their limitations in analyzing nonlinear and nonparametric data. We propose robust statistical methods, such as Spearman’s correlation and Kendall’s tau, as alternatives for capturing complex relationships while providing essential directional information. Additionally, attention to Variance Inflation Factor (VIF) is crucial for mitigating feature inflation. By addressing these concerns, researchers can achieve more reliable analyses and deeper insight into variable relationships.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Catalysis
Journal of Catalysis 工程技术-工程:化工
CiteScore
12.30
自引率
5.50%
发文量
447
审稿时长
31 days
期刊介绍: The Journal of Catalysis publishes scholarly articles on both heterogeneous and homogeneous catalysis, covering a wide range of chemical transformations. These include various types of catalysis, such as those mediated by photons, plasmons, and electrons. The focus of the studies is to understand the relationship between catalytic function and the underlying chemical properties of surfaces and metal complexes. The articles in the journal offer innovative concepts and explore the synthesis and kinetics of inorganic solids and homogeneous complexes. Furthermore, they discuss spectroscopic techniques for characterizing catalysts, investigate the interaction of probes and reacting species with catalysts, and employ theoretical methods. The research presented in the journal should have direct relevance to the field of catalytic processes, addressing either fundamental aspects or applications of catalysis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信