{"title":"Ultra-High Dimensional Model Averaging for Multi-Categorical Response","authors":"Jing Lv, Chaohui Guo","doi":"10.1007/s40304-023-00379-x","DOIUrl":null,"url":null,"abstract":"<p>Model averaging has been considered to be a powerful tool for model-based prediction in the past decades. However, its application in ultra-high dimensional multi-categorical data is faced with challenges arising from the model uncertainty and heterogeneity. In this article, a novel two-step model averaging method is proposed for multi-categorical response when the number of covariates is ultra-high. First, a class of adaptive multinomial logistic regression candidate models are constructed where different covariates for each category are allowed to accommodate heterogeneity. Second, the optimal model weights is chosen by applying the Kullback–Leibler loss plus a penalty term. We show that the proposed model averaging estimator is asymptotically optimal by achieving the minimum Kullback–Leibler loss among all possible averaging estimators. Empirical evidences from simulation studies and a real data example demonstrate that the proposed model averaging method has superior performance to the state-of-the-art approaches.\n</p>","PeriodicalId":10575,"journal":{"name":"Communications in Mathematics and Statistics","volume":"38 1","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications in Mathematics and Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s40304-023-00379-x","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Model averaging has been considered to be a powerful tool for model-based prediction in the past decades. However, its application in ultra-high dimensional multi-categorical data is faced with challenges arising from the model uncertainty and heterogeneity. In this article, a novel two-step model averaging method is proposed for multi-categorical response when the number of covariates is ultra-high. First, a class of adaptive multinomial logistic regression candidate models are constructed where different covariates for each category are allowed to accommodate heterogeneity. Second, the optimal model weights is chosen by applying the Kullback–Leibler loss plus a penalty term. We show that the proposed model averaging estimator is asymptotically optimal by achieving the minimum Kullback–Leibler loss among all possible averaging estimators. Empirical evidences from simulation studies and a real data example demonstrate that the proposed model averaging method has superior performance to the state-of-the-art approaches.
期刊介绍:
Communications in Mathematics and Statistics is an international journal published by Springer-Verlag in collaboration with the School of Mathematical Sciences, University of Science and Technology of China (USTC). The journal will be committed to publish high level original peer reviewed research papers in various areas of mathematical sciences, including pure mathematics, applied mathematics, computational mathematics, and probability and statistics. Typically one volume is published each year, and each volume consists of four issues.