{"title":"Multi-source Stable Variable Importance Measure via Adversarial Machine Learning","authors":"Zitao Wang, Nian Si, Zijian Guo, Molei Liu","doi":"arxiv-2409.07380","DOIUrl":null,"url":null,"abstract":"As part of enhancing the interpretability of machine learning, it is of\nrenewed interest to quantify and infer the predictive importance of certain\nexposure covariates. Modern scientific studies often collect data from multiple\nsources with distributional heterogeneity. Thus, measuring and inferring stable\nassociations across multiple environments is crucial in reliable and\ngeneralizable decision-making. In this paper, we propose MIMAL, a novel\nstatistical framework for Multi-source stable Importance Measure via\nAdversarial Learning. MIMAL measures the importance of some exposure variables\nby maximizing the worst-case predictive reward over the source mixture. Our\nframework allows various machine learning methods for confounding adjustment\nand exposure effect characterization. For inferential analysis, the asymptotic\nnormality of our introduced statistic is established under a general machine\nlearning framework that requires no stronger learning accuracy conditions than\nthose for single source variable importance. Numerical studies with various\ntypes of data generation setups and machine learning implementation are\nconducted to justify the finite-sample performance of MIMAL. We also illustrate\nour method through a real-world study of Beijing air pollution in multiple\nlocations.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"44 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07380","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
As part of enhancing the interpretability of machine learning, it is of
renewed interest to quantify and infer the predictive importance of certain
exposure covariates. Modern scientific studies often collect data from multiple
sources with distributional heterogeneity. Thus, measuring and inferring stable
associations across multiple environments is crucial in reliable and
generalizable decision-making. In this paper, we propose MIMAL, a novel
statistical framework for Multi-source stable Importance Measure via
Adversarial Learning. MIMAL measures the importance of some exposure variables
by maximizing the worst-case predictive reward over the source mixture. Our
framework allows various machine learning methods for confounding adjustment
and exposure effect characterization. For inferential analysis, the asymptotic
normality of our introduced statistic is established under a general machine
learning framework that requires no stronger learning accuracy conditions than
those for single source variable importance. Numerical studies with various
types of data generation setups and machine learning implementation are
conducted to justify the finite-sample performance of MIMAL. We also illustrate
our method through a real-world study of Beijing air pollution in multiple
locations.