{"title":"模型选择的不确定性和β回归模型的稳定性:基于bootstrap的模型平均研究与经验应用于点击流数据","authors":"Corban Allenbrand, Ben Sherwood","doi":"10.1214/22-aoas1647","DOIUrl":null,"url":null,"abstract":"Statistical model development is a central feature of many scientific investigations with a vast methodological landscape. However, uncertainty in the model development process has received less attention and is frequently resolved non-rigorously through beliefs about generalizability, practical usefulness, and computational ease. This is particularly problematic in settings of abundant data, such as clickstream data, as model selection routinely admits multiple models and imposes a source of uncertainty, unacknowledged and unknown by many, on all post-selection conclusions. Regression models based on the beta distribution are class of non-linear models, attractive because of their great flexibility and potential explanatory power, but have not been investigated from the standpoint of multi-model uncertainty and model averaging. For this reason, a formalized tool that can combine model selection uncertainty and beta regression modeling is presented in this work. The tool combines bootstrap model averaging, model selection, and asymptotic theory to yield a procedure that can perform joint modeling of the mean and precision parameters, capture sources of variability in the data, and achieve more accurate claims of estimate precision, variable importance, gen-eralization performance, and model stability. Practical utility of the tool is demonstrated through a study of model selection consistency and variable importance in average exit and bounce rate statistical models. This work emphasizes the necessity of a departure from the all-too-common practice of ignoring model selection uncertainty and introduces an accessible technique to handle frequently neglected aspects of the modeling pipeline.","PeriodicalId":188068,"journal":{"name":"The Annals of Applied Statistics","volume":"153 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Model selection uncertainty and stability in beta regression models: A study of bootstrap-based model averaging with an empirical application to clickstream data\",\"authors\":\"Corban Allenbrand, Ben Sherwood\",\"doi\":\"10.1214/22-aoas1647\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Statistical model development is a central feature of many scientific investigations with a vast methodological landscape. However, uncertainty in the model development process has received less attention and is frequently resolved non-rigorously through beliefs about generalizability, practical usefulness, and computational ease. This is particularly problematic in settings of abundant data, such as clickstream data, as model selection routinely admits multiple models and imposes a source of uncertainty, unacknowledged and unknown by many, on all post-selection conclusions. Regression models based on the beta distribution are class of non-linear models, attractive because of their great flexibility and potential explanatory power, but have not been investigated from the standpoint of multi-model uncertainty and model averaging. For this reason, a formalized tool that can combine model selection uncertainty and beta regression modeling is presented in this work. The tool combines bootstrap model averaging, model selection, and asymptotic theory to yield a procedure that can perform joint modeling of the mean and precision parameters, capture sources of variability in the data, and achieve more accurate claims of estimate precision, variable importance, gen-eralization performance, and model stability. Practical utility of the tool is demonstrated through a study of model selection consistency and variable importance in average exit and bounce rate statistical models. This work emphasizes the necessity of a departure from the all-too-common practice of ignoring model selection uncertainty and introduces an accessible technique to handle frequently neglected aspects of the modeling pipeline.\",\"PeriodicalId\":188068,\"journal\":{\"name\":\"The Annals of Applied Statistics\",\"volume\":\"153 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Annals of Applied Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1214/22-aoas1647\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Annals of Applied Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1214/22-aoas1647","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Model selection uncertainty and stability in beta regression models: A study of bootstrap-based model averaging with an empirical application to clickstream data
Statistical model development is a central feature of many scientific investigations with a vast methodological landscape. However, uncertainty in the model development process has received less attention and is frequently resolved non-rigorously through beliefs about generalizability, practical usefulness, and computational ease. This is particularly problematic in settings of abundant data, such as clickstream data, as model selection routinely admits multiple models and imposes a source of uncertainty, unacknowledged and unknown by many, on all post-selection conclusions. Regression models based on the beta distribution are class of non-linear models, attractive because of their great flexibility and potential explanatory power, but have not been investigated from the standpoint of multi-model uncertainty and model averaging. For this reason, a formalized tool that can combine model selection uncertainty and beta regression modeling is presented in this work. The tool combines bootstrap model averaging, model selection, and asymptotic theory to yield a procedure that can perform joint modeling of the mean and precision parameters, capture sources of variability in the data, and achieve more accurate claims of estimate precision, variable importance, gen-eralization performance, and model stability. Practical utility of the tool is demonstrated through a study of model selection consistency and variable importance in average exit and bounce rate statistical models. This work emphasizes the necessity of a departure from the all-too-common practice of ignoring model selection uncertainty and introduces an accessible technique to handle frequently neglected aspects of the modeling pipeline.