{"title":"Searching for correct specification in spatial probit models. Classical approaches versus Gradient Boosting algorithm","authors":"Miguel De la Llave , Fernando A. López","doi":"10.1016/j.spasta.2024.100815","DOIUrl":null,"url":null,"abstract":"<div><p>Selecting correct specification in spatial model frameworks is a relevant research topic in spatial econometrics. The purpose of this paper is to examine and contrast two well-known model selection strategies, Specific-to-General, Stge, and General-to-Specific, Gets, in the context of spatial probit models. The results obtained from these classical methods are juxtaposed with those generated through the utilization of a powerful machine learning algorithm: Gradient Boosting. The paper includes an extensive Monte Carlo experiment to compare the performance of these three strategies with small and medium sample sizes. The results show that under ideal conditions, both classical strategies obtain similar results for medium-sized samples, but for small samples, Stge performs slightly better than Gets. The Gradient Boosting algorithm obtains slightly higher success rates than the classical strategies, especially with small samples sizes. Finally, the flow of both strategies is illustrated using a well-known dataset on the probability of businesses reopening in New Orleans in the aftermath of Hurricane Katrina.</p></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2024-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spatial Statistics","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S221167532400006X","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Selecting correct specification in spatial model frameworks is a relevant research topic in spatial econometrics. The purpose of this paper is to examine and contrast two well-known model selection strategies, Specific-to-General, Stge, and General-to-Specific, Gets, in the context of spatial probit models. The results obtained from these classical methods are juxtaposed with those generated through the utilization of a powerful machine learning algorithm: Gradient Boosting. The paper includes an extensive Monte Carlo experiment to compare the performance of these three strategies with small and medium sample sizes. The results show that under ideal conditions, both classical strategies obtain similar results for medium-sized samples, but for small samples, Stge performs slightly better than Gets. The Gradient Boosting algorithm obtains slightly higher success rates than the classical strategies, especially with small samples sizes. Finally, the flow of both strategies is illustrated using a well-known dataset on the probability of businesses reopening in New Orleans in the aftermath of Hurricane Katrina.
期刊介绍:
Spatial Statistics publishes articles on the theory and application of spatial and spatio-temporal statistics. It favours manuscripts that present theory generated by new applications, or in which new theory is applied to an important practical case. A purely theoretical study will only rarely be accepted. Pure case studies without methodological development are not acceptable for publication.
Spatial statistics concerns the quantitative analysis of spatial and spatio-temporal data, including their statistical dependencies, accuracy and uncertainties. Methodology for spatial statistics is typically found in probability theory, stochastic modelling and mathematical statistics as well as in information science. Spatial statistics is used in mapping, assessing spatial data quality, sampling design optimisation, modelling of dependence structures, and drawing of valid inference from a limited set of spatio-temporal data.