模型选择的不确定性和β回归模型的稳定性:基于bootstrap的模型平均研究与经验应用于点击流数据

The Annals of Applied Statistics Pub Date : 2023-03-01 DOI:10.1214/22-aoas1647

Corban Allenbrand, Ben Sherwood

{"title":"模型选择的不确定性和β回归模型的稳定性:基于bootstrap的模型平均研究与经验应用于点击流数据","authors":"Corban Allenbrand, Ben Sherwood","doi":"10.1214/22-aoas1647","DOIUrl":null,"url":null,"abstract":"Statistical model development is a central feature of many scientific investigations with a vast methodological landscape. However, uncertainty in the model development process has received less attention and is frequently resolved non-rigorously through beliefs about generalizability, practical usefulness, and computational ease. This is particularly problematic in settings of abundant data, such as clickstream data, as model selection routinely admits multiple models and imposes a source of uncertainty, unacknowledged and unknown by many, on all post-selection conclusions. Regression models based on the beta distribution are class of non-linear models, attractive because of their great flexibility and potential explanatory power, but have not been investigated from the standpoint of multi-model uncertainty and model averaging. For this reason, a formalized tool that can combine model selection uncertainty and beta regression modeling is presented in this work. The tool combines bootstrap model averaging, model selection, and asymptotic theory to yield a procedure that can perform joint modeling of the mean and precision parameters, capture sources of variability in the data, and achieve more accurate claims of estimate precision, variable importance, gen-eralization performance, and model stability. Practical utility of the tool is demonstrated through a study of model selection consistency and variable importance in average exit and bounce rate statistical models. This work emphasizes the necessity of a departure from the all-too-common practice of ignoring model selection uncertainty and introduces an accessible technique to handle frequently neglected aspects of the modeling pipeline.","PeriodicalId":188068,"journal":{"name":"The Annals of Applied Statistics","volume":"153 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Model selection uncertainty and stability in beta regression models: A study of bootstrap-based model averaging with an empirical application to clickstream data\",\"authors\":\"Corban Allenbrand, Ben Sherwood\",\"doi\":\"10.1214/22-aoas1647\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Statistical model development is a central feature of many scientific investigations with a vast methodological landscape. However, uncertainty in the model development process has received less attention and is frequently resolved non-rigorously through beliefs about generalizability, practical usefulness, and computational ease. This is particularly problematic in settings of abundant data, such as clickstream data, as model selection routinely admits multiple models and imposes a source of uncertainty, unacknowledged and unknown by many, on all post-selection conclusions. Regression models based on the beta distribution are class of non-linear models, attractive because of their great flexibility and potential explanatory power, but have not been investigated from the standpoint of multi-model uncertainty and model averaging. For this reason, a formalized tool that can combine model selection uncertainty and beta regression modeling is presented in this work. The tool combines bootstrap model averaging, model selection, and asymptotic theory to yield a procedure that can perform joint modeling of the mean and precision parameters, capture sources of variability in the data, and achieve more accurate claims of estimate precision, variable importance, gen-eralization performance, and model stability. Practical utility of the tool is demonstrated through a study of model selection consistency and variable importance in average exit and bounce rate statistical models. This work emphasizes the necessity of a departure from the all-too-common practice of ignoring model selection uncertainty and introduces an accessible technique to handle frequently neglected aspects of the modeling pipeline.\",\"PeriodicalId\":188068,\"journal\":{\"name\":\"The Annals of Applied Statistics\",\"volume\":\"153 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Annals of Applied Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1214/22-aoas1647\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Annals of Applied Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1214/22-aoas1647","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

统计模型的发展是许多科学研究的中心特征，具有广阔的方法论景观。然而，模型开发过程中的不确定性受到的关注较少，并且经常通过对概括性、实用性和计算便捷性的信念来不严格地解决。这在数据丰富的情况下尤其成问题，比如点击流数据，因为模型选择通常会承认多个模型，并在所有选择后的结论中施加不确定性的来源，许多人都未承认和未知。基于beta分布的回归模型是一类非线性模型，由于其极大的灵活性和潜在的解释力而受到人们的关注，但尚未从多模型不确定性和模型平均的角度进行研究。因此，本文提出了一种形式化的工具，可以将模型选择不确定性和beta回归建模结合起来。该工具结合了自举模型平均、模型选择和渐近理论，产生了一个程序，可以执行均值和精度参数的联合建模，捕获数据中的可变性来源，并实现更准确的估计精度、变量重要性、泛化性能和模型稳定性的要求。通过对平均退出率和跳出率统计模型中模型选择一致性和变量重要性的研究，证明了该工具的实用性。这项工作强调了从忽略模型选择不确定性的普遍做法中分离出来的必要性，并引入了一种可访问的技术来处理建模管道中经常被忽视的方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Model selection uncertainty and stability in beta regression models: A study of bootstrap-based model averaging with an empirical application to clickstream data

Statistical model development is a central feature of many scientific investigations with a vast methodological landscape. However, uncertainty in the model development process has received less attention and is frequently resolved non-rigorously through beliefs about generalizability, practical usefulness, and computational ease. This is particularly problematic in settings of abundant data, such as clickstream data, as model selection routinely admits multiple models and imposes a source of uncertainty, unacknowledged and unknown by many, on all post-selection conclusions. Regression models based on the beta distribution are class of non-linear models, attractive because of their great flexibility and potential explanatory power, but have not been investigated from the standpoint of multi-model uncertainty and model averaging. For this reason, a formalized tool that can combine model selection uncertainty and beta regression modeling is presented in this work. The tool combines bootstrap model averaging, model selection, and asymptotic theory to yield a procedure that can perform joint modeling of the mean and precision parameters, capture sources of variability in the data, and achieve more accurate claims of estimate precision, variable importance, gen-eralization performance, and model stability. Practical utility of the tool is demonstrated through a study of model selection consistency and variable importance in average exit and bounce rate statistical models. This work emphasizes the necessity of a departure from the all-too-common practice of ignoring model selection uncertainty and introduces an accessible technique to handle frequently neglected aspects of the modeling pipeline.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The Annals of Applied Statistics

自引率

0.00%

发文量