{"title":"利用一般空间模型的惩罚似然估计快速检测离群值、选择模型和变量","authors":"Yunquan Song, Minglu Fang, Yuanfeng Wang, Yiming Hou","doi":"10.1016/j.spasta.2024.100834","DOIUrl":null,"url":null,"abstract":"<div><p>The outliers in the data set have a potential influence on the statistical inference and can provide some useful information behind the data set, the methodology for outlier detection and accommodation is always an important topic in data analysis. For spatial data, its influence not only affects coefficient estimation but model selection. The traditional method usually carries out outlier detection, model selection and variable selection step by step, so the data processing efficiency is not high. In order to further improve the efficiency and accuracy of data processing, based on the general spatial model, we consider a technique to achieve outlier detection, along with model and variable estimation in one step. In the general spatial model, we add a mean shift parameter for each data point to identify outliers. Penalized likelihood estimation (PLE) is proposed to simultaneously detect outliers, and to select spatial models and explanatory variables for spatial data. This method correctly identifies multiple outliers, provides a proper spatial model, and corrects coefficient estimation without removing outliers in numerical simulation and case analysis. Compared to current methods, PLE detects outliers more quickly, and solves the optimization problem to select spatial models and explanatory variables. Calculation is easy using the optimized solnp function in R software.</p></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Rapid outlier detection, model selection and variable selection using penalized likelihood estimation for general spatial models\",\"authors\":\"Yunquan Song, Minglu Fang, Yuanfeng Wang, Yiming Hou\",\"doi\":\"10.1016/j.spasta.2024.100834\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The outliers in the data set have a potential influence on the statistical inference and can provide some useful information behind the data set, the methodology for outlier detection and accommodation is always an important topic in data analysis. For spatial data, its influence not only affects coefficient estimation but model selection. The traditional method usually carries out outlier detection, model selection and variable selection step by step, so the data processing efficiency is not high. In order to further improve the efficiency and accuracy of data processing, based on the general spatial model, we consider a technique to achieve outlier detection, along with model and variable estimation in one step. In the general spatial model, we add a mean shift parameter for each data point to identify outliers. Penalized likelihood estimation (PLE) is proposed to simultaneously detect outliers, and to select spatial models and explanatory variables for spatial data. This method correctly identifies multiple outliers, provides a proper spatial model, and corrects coefficient estimation without removing outliers in numerical simulation and case analysis. Compared to current methods, PLE detects outliers more quickly, and solves the optimization problem to select spatial models and explanatory variables. Calculation is easy using the optimized solnp function in R software.</p></div>\",\"PeriodicalId\":48771,\"journal\":{\"name\":\"Spatial Statistics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-04-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Spatial Statistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2211675324000253\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"GEOSCIENCES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spatial Statistics","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211675324000253","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
摘要
数据集中的离群值对统计推断有潜在影响,并能提供数据集背后的一些有用信息,因此离群值的检测和容纳方法始终是数据分析中的一个重要课题。对于空间数据而言,其影响不仅会影响系数估计,还会影响模型选择。传统的方法通常是逐步进行离群点检测、模型选择和变量选择,因此数据处理效率不高。为了进一步提高数据处理的效率和准确性,我们在一般空间模型的基础上,考虑采用一种技术来实现离群点检测、模型和变量估计的一步到位。在一般空间模型中,我们为每个数据点添加一个均值偏移参数,以识别离群值。我们提出了惩罚似然估计法(PLE)来同时检测异常值,并为空间数据选择空间模型和解释变量。在数值模拟和案例分析中,该方法能正确识别多个离群值,提供合适的空间模型,并在不去除离群值的情况下修正系数估计。与现有方法相比,PLE 能更快地发现异常值,并解决选择空间模型和解释变量的优化问题。使用 R 软件中的优化 solnp 函数,计算非常简单。
Rapid outlier detection, model selection and variable selection using penalized likelihood estimation for general spatial models
The outliers in the data set have a potential influence on the statistical inference and can provide some useful information behind the data set, the methodology for outlier detection and accommodation is always an important topic in data analysis. For spatial data, its influence not only affects coefficient estimation but model selection. The traditional method usually carries out outlier detection, model selection and variable selection step by step, so the data processing efficiency is not high. In order to further improve the efficiency and accuracy of data processing, based on the general spatial model, we consider a technique to achieve outlier detection, along with model and variable estimation in one step. In the general spatial model, we add a mean shift parameter for each data point to identify outliers. Penalized likelihood estimation (PLE) is proposed to simultaneously detect outliers, and to select spatial models and explanatory variables for spatial data. This method correctly identifies multiple outliers, provides a proper spatial model, and corrects coefficient estimation without removing outliers in numerical simulation and case analysis. Compared to current methods, PLE detects outliers more quickly, and solves the optimization problem to select spatial models and explanatory variables. Calculation is easy using the optimized solnp function in R software.
期刊介绍:
Spatial Statistics publishes articles on the theory and application of spatial and spatio-temporal statistics. It favours manuscripts that present theory generated by new applications, or in which new theory is applied to an important practical case. A purely theoretical study will only rarely be accepted. Pure case studies without methodological development are not acceptable for publication.
Spatial statistics concerns the quantitative analysis of spatial and spatio-temporal data, including their statistical dependencies, accuracy and uncertainties. Methodology for spatial statistics is typically found in probability theory, stochastic modelling and mathematical statistics as well as in information science. Spatial statistics is used in mapping, assessing spatial data quality, sampling design optimisation, modelling of dependence structures, and drawing of valid inference from a limited set of spatio-temporal data.