{"title":"脊回归的分布Mallows模型平均","authors":"Haili Zhang, Alan T. K. Wan, Kang You, Guohua Zou","doi":"10.1007/s10114-025-3409-x","DOIUrl":null,"url":null,"abstract":"<div><p>Ridge regression is an effective tool to handle multicollinearity in regressions. It is also an essential type of shrinkage and regularization methods and is widely used in big data and distributed data applications. The divide and conquer trick, which combines the estimator in each subset with equal weight, is commonly applied in distributed data. To overcome multicollinearity and improve estimation accuracy in the presence of distributed data, we propose a Mallows-type model averaging method for ridge regressions, which combines estimators from all subsets. Our method is proved to be asymptotically optimal allowing the number of subsets and the dimension of variables to be divergent. The consistency of the resultant weight estimators tending to the theoretically optimal weights is also derived. Furthermore, the asymptotic normality of the model averaging estimator is demonstrated. Our simulation study and real data analysis show that the proposed model averaging method often performs better than commonly used model selection and model averaging methods in distributed data cases.</p></div>","PeriodicalId":50893,"journal":{"name":"Acta Mathematica Sinica-English Series","volume":"41 2","pages":"780 - 826"},"PeriodicalIF":0.8000,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Distributed Mallows Model Averaging for Ridge Regressions\",\"authors\":\"Haili Zhang, Alan T. K. Wan, Kang You, Guohua Zou\",\"doi\":\"10.1007/s10114-025-3409-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Ridge regression is an effective tool to handle multicollinearity in regressions. It is also an essential type of shrinkage and regularization methods and is widely used in big data and distributed data applications. The divide and conquer trick, which combines the estimator in each subset with equal weight, is commonly applied in distributed data. To overcome multicollinearity and improve estimation accuracy in the presence of distributed data, we propose a Mallows-type model averaging method for ridge regressions, which combines estimators from all subsets. Our method is proved to be asymptotically optimal allowing the number of subsets and the dimension of variables to be divergent. The consistency of the resultant weight estimators tending to the theoretically optimal weights is also derived. Furthermore, the asymptotic normality of the model averaging estimator is demonstrated. Our simulation study and real data analysis show that the proposed model averaging method often performs better than commonly used model selection and model averaging methods in distributed data cases.</p></div>\",\"PeriodicalId\":50893,\"journal\":{\"name\":\"Acta Mathematica Sinica-English Series\",\"volume\":\"41 2\",\"pages\":\"780 - 826\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2025-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acta Mathematica Sinica-English Series\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10114-025-3409-x\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Mathematica Sinica-English Series","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1007/s10114-025-3409-x","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS","Score":null,"Total":0}
Distributed Mallows Model Averaging for Ridge Regressions
Ridge regression is an effective tool to handle multicollinearity in regressions. It is also an essential type of shrinkage and regularization methods and is widely used in big data and distributed data applications. The divide and conquer trick, which combines the estimator in each subset with equal weight, is commonly applied in distributed data. To overcome multicollinearity and improve estimation accuracy in the presence of distributed data, we propose a Mallows-type model averaging method for ridge regressions, which combines estimators from all subsets. Our method is proved to be asymptotically optimal allowing the number of subsets and the dimension of variables to be divergent. The consistency of the resultant weight estimators tending to the theoretically optimal weights is also derived. Furthermore, the asymptotic normality of the model averaging estimator is demonstrated. Our simulation study and real data analysis show that the proposed model averaging method often performs better than commonly used model selection and model averaging methods in distributed data cases.
期刊介绍:
Acta Mathematica Sinica, established by the Chinese Mathematical Society in 1936, is the first and the best mathematical journal in China. In 1985, Acta Mathematica Sinica is divided into English Series and Chinese Series. The English Series is a monthly journal, publishing significant research papers from all branches of pure and applied mathematics. It provides authoritative reviews of current developments in mathematical research. Contributions are invited from researchers from all over the world.