{"title":"贝叶斯模型平均化和正则化回归作为数据驱动模型探索的方法,以及实际考虑因素","authors":"Hyemin Han","doi":"10.3390/stats7030044","DOIUrl":null,"url":null,"abstract":"Methodological experts suggest that psychological and educational researchers should employ appropriate methods for data-driven model exploration, such as Bayesian Model Averaging and regularized regression, instead of conventional hypothesis-driven testing, if they want to explore the best prediction model. I intend to discuss practical considerations regarding data-driven methods for end-user researchers without sufficient expertise in quantitative methods. I tested three data-driven methods, i.e., Bayesian Model Averaging, LASSO as a form of regularized regression, and stepwise regression, with datasets in psychology and education. I compared their performance in terms of cross-validity indicating robustness against overfitting across different conditions. I employed functionalities widely available via R with default settings to provide information relevant to end users without advanced statistical knowledge. The results demonstrated that LASSO showed the best performance and Bayesian Model Averaging outperformed stepwise regression when there were many candidate predictors to explore. Based on these findings, I discussed appropriately using the data-driven model exploration methods across different situations from laypeople’s perspectives.","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bayesian Model Averaging and Regularized Regression as Methods for Data-Driven Model Exploration, with Practical Considerations\",\"authors\":\"Hyemin Han\",\"doi\":\"10.3390/stats7030044\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Methodological experts suggest that psychological and educational researchers should employ appropriate methods for data-driven model exploration, such as Bayesian Model Averaging and regularized regression, instead of conventional hypothesis-driven testing, if they want to explore the best prediction model. I intend to discuss practical considerations regarding data-driven methods for end-user researchers without sufficient expertise in quantitative methods. I tested three data-driven methods, i.e., Bayesian Model Averaging, LASSO as a form of regularized regression, and stepwise regression, with datasets in psychology and education. I compared their performance in terms of cross-validity indicating robustness against overfitting across different conditions. I employed functionalities widely available via R with default settings to provide information relevant to end users without advanced statistical knowledge. The results demonstrated that LASSO showed the best performance and Bayesian Model Averaging outperformed stepwise regression when there were many candidate predictors to explore. Based on these findings, I discussed appropriately using the data-driven model exploration methods across different situations from laypeople’s perspectives.\",\"PeriodicalId\":0,\"journal\":{\"name\":\"\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0,\"publicationDate\":\"2024-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/stats7030044\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/stats7030044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
方法论专家建议,心理和教育研究人员若想探索最佳预测模型,应采用适当的数据驱动模型探索方法,如贝叶斯模型平均法和正则化回归,而不是传统的假设驱动测试。我打算为没有足够定量方法专业知识的最终用户研究人员讨论有关数据驱动方法的实际注意事项。我用心理学和教育学的数据集测试了三种数据驱动方法,即贝叶斯模型平均法、作为正则化回归一种形式的 LASSO 以及逐步回归法。我比较了它们在交叉有效性方面的表现,这表明它们在不同条件下对过度拟合的稳健性。我使用了 R 广泛提供的功能,并进行了默认设置,以便为不具备高级统计知识的最终用户提供相关信息。结果表明,当有许多候选预测因子需要探索时,LASSO 的性能最佳,而贝叶斯模型平均法的性能则优于逐步回归法。基于这些发现,我从非专业人士的角度讨论了在不同情况下适当使用数据驱动模型探索方法的问题。
Bayesian Model Averaging and Regularized Regression as Methods for Data-Driven Model Exploration, with Practical Considerations
Methodological experts suggest that psychological and educational researchers should employ appropriate methods for data-driven model exploration, such as Bayesian Model Averaging and regularized regression, instead of conventional hypothesis-driven testing, if they want to explore the best prediction model. I intend to discuss practical considerations regarding data-driven methods for end-user researchers without sufficient expertise in quantitative methods. I tested three data-driven methods, i.e., Bayesian Model Averaging, LASSO as a form of regularized regression, and stepwise regression, with datasets in psychology and education. I compared their performance in terms of cross-validity indicating robustness against overfitting across different conditions. I employed functionalities widely available via R with default settings to provide information relevant to end users without advanced statistical knowledge. The results demonstrated that LASSO showed the best performance and Bayesian Model Averaging outperformed stepwise regression when there were many candidate predictors to explore. Based on these findings, I discussed appropriately using the data-driven model exploration methods across different situations from laypeople’s perspectives.