{"title":"An urgent call for robust statistical methods in reliable feature importance analysis across machine learning","authors":"Yoshiyasu Takefuji","doi":"10.1016/j.jcat.2025.116098","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate analytical outcomes in machine learning are contingent on error-free calculations and a solid understanding of foundational principles. A notable challenge arises from the lack of ground truth values for validation, complicating the assessment of feature importance, especially when employing linear models with parametric assumptions. This paper critiques the use of Pearson correlation and feature importances derived from Gradient Boosting Regressor (GBR), emphasizing their limitations in analyzing nonlinear and nonparametric data. We propose robust statistical methods, such as Spearman’s correlation and Kendall’s tau, as alternatives for capturing complex relationships while providing essential directional information. Additionally, attention to Variance Inflation Factor (VIF) is crucial for mitigating feature inflation. By addressing these concerns, researchers can achieve more reliable analyses and deeper insight into variable relationships.</div></div>","PeriodicalId":346,"journal":{"name":"Journal of Catalysis","volume":"446 ","pages":"Article 116098"},"PeriodicalIF":6.5000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Catalysis","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0021951725001630","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate analytical outcomes in machine learning are contingent on error-free calculations and a solid understanding of foundational principles. A notable challenge arises from the lack of ground truth values for validation, complicating the assessment of feature importance, especially when employing linear models with parametric assumptions. This paper critiques the use of Pearson correlation and feature importances derived from Gradient Boosting Regressor (GBR), emphasizing their limitations in analyzing nonlinear and nonparametric data. We propose robust statistical methods, such as Spearman’s correlation and Kendall’s tau, as alternatives for capturing complex relationships while providing essential directional information. Additionally, attention to Variance Inflation Factor (VIF) is crucial for mitigating feature inflation. By addressing these concerns, researchers can achieve more reliable analyses and deeper insight into variable relationships.
期刊介绍:
The Journal of Catalysis publishes scholarly articles on both heterogeneous and homogeneous catalysis, covering a wide range of chemical transformations. These include various types of catalysis, such as those mediated by photons, plasmons, and electrons. The focus of the studies is to understand the relationship between catalytic function and the underlying chemical properties of surfaces and metal complexes.
The articles in the journal offer innovative concepts and explore the synthesis and kinetics of inorganic solids and homogeneous complexes. Furthermore, they discuss spectroscopic techniques for characterizing catalysts, investigate the interaction of probes and reacting species with catalysts, and employ theoretical methods.
The research presented in the journal should have direct relevance to the field of catalytic processes, addressing either fundamental aspects or applications of catalysis.