为大型非高斯空间数据建模的灵活基础表示法

IF 2.5 2区数学 Q3 GEOSCIENCES, MULTIDISCIPLINARY

Spatial Statistics Pub Date : 2024-08-01 DOI:10.1016/j.spasta.2024.100841

Remy MacDonald, Benjamin Seiyon Lee

{"title":"为大型非高斯空间数据建模的灵活基础表示法","authors":"Remy MacDonald, Benjamin Seiyon Lee","doi":"10.1016/j.spasta.2024.100841","DOIUrl":null,"url":null,"abstract":"<div><p>Nonstationary and non-Gaussian spatial data are common in various fields, including ecology (e.g., counts of animal species), epidemiology (e.g., disease incidence counts in susceptible regions), and environmental science (e.g., remotely-sensed satellite imagery). Due to modern data collection methods, the size of these datasets have grown considerably. Spatial generalized linear mixed models (SGLMMs) are a flexible class of models used to model nonstationary and non-Gaussian datasets. Despite their utility, SGLMMs can be computationally prohibitive for even moderately large datasets (e.g., 5000 to 100,000 observed locations). To circumvent this issue, past studies have embedded nested radial basis functions into the SGLMM. However, two crucial specifications (knot placement and bandwidth parameters), which directly affect model performance, are typically fixed prior to model-fitting. We propose a novel approach to model large nonstationary and non-Gaussian spatial datasets using adaptive radial basis functions. Our approach: (1) partitions the spatial domain into subregions; (2) employs reversible-jump Markov chain Monte Carlo (RJMCMC) to infer the number and location of the knots within each partition; and (3) models the latent spatial surface using partition-varying and adaptive basis functions. Through an extensive simulation study, we show that our approach provides more accurate predictions than competing methods while preserving computational efficiency. We demonstrate our approach on two environmental datasets - incidences of plant species and counts of bird species in the United States.</p></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":"62 ","pages":"Article 100841"},"PeriodicalIF":2.5000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Flexible basis representations for modeling large non-Gaussian spatial data\",\"authors\":\"Remy MacDonald, Benjamin Seiyon Lee\",\"doi\":\"10.1016/j.spasta.2024.100841\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Nonstationary and non-Gaussian spatial data are common in various fields, including ecology (e.g., counts of animal species), epidemiology (e.g., disease incidence counts in susceptible regions), and environmental science (e.g., remotely-sensed satellite imagery). Due to modern data collection methods, the size of these datasets have grown considerably. Spatial generalized linear mixed models (SGLMMs) are a flexible class of models used to model nonstationary and non-Gaussian datasets. Despite their utility, SGLMMs can be computationally prohibitive for even moderately large datasets (e.g., 5000 to 100,000 observed locations). To circumvent this issue, past studies have embedded nested radial basis functions into the SGLMM. However, two crucial specifications (knot placement and bandwidth parameters), which directly affect model performance, are typically fixed prior to model-fitting. We propose a novel approach to model large nonstationary and non-Gaussian spatial datasets using adaptive radial basis functions. Our approach: (1) partitions the spatial domain into subregions; (2) employs reversible-jump Markov chain Monte Carlo (RJMCMC) to infer the number and location of the knots within each partition; and (3) models the latent spatial surface using partition-varying and adaptive basis functions. Through an extensive simulation study, we show that our approach provides more accurate predictions than competing methods while preserving computational efficiency. We demonstrate our approach on two environmental datasets - incidences of plant species and counts of bird species in the United States.</p></div>\",\"PeriodicalId\":48771,\"journal\":{\"name\":\"Spatial Statistics\",\"volume\":\"62 \",\"pages\":\"Article 100841\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2024-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Spatial Statistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2211675324000320\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"GEOSCIENCES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spatial Statistics","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211675324000320","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

非平稳和非高斯空间数据常见于各个领域，包括生态学（如动物物种计数）、流行病学（如易感地区的疾病发病率计数）和环境科学（如遥感卫星图像）。由于采用了现代数据收集方法，这些数据集的规模已大幅扩大。空间广义线性混合模型（SGLMM）是一类灵活的模型，用于对非平稳和非高斯数据集进行建模。尽管空间广义线性混合模型非常有用，但对于中等规模的数据集（如 5000 到 100000 个观测地点）来说，其计算量也可能过大。为了规避这一问题，过去的研究将嵌套径向基函数嵌入到 SGLMM 中。然而，直接影响模型性能的两个关键参数（节点位置和带宽参数）在模型拟合之前通常是固定不变的。我们提出了一种使用自适应径向基函数对大型非平稳和非高斯空间数据集进行建模的新方法。我们的方法：(1) 将空间域划分为子区域；(2) 采用可逆跳转马尔可夫链蒙特卡罗（RJMCMC）来推断每个分区内节点的数量和位置；(3) 使用分区变化和自适应基函数对潜在空间表面进行建模。通过广泛的模拟研究，我们证明了我们的方法在保持计算效率的同时，比其他竞争方法提供了更准确的预测。我们在两个环境数据集--美国植物物种发生率和鸟类物种计数--上演示了我们的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Flexible basis representations for modeling large non-Gaussian spatial data

Nonstationary and non-Gaussian spatial data are common in various fields, including ecology (e.g., counts of animal species), epidemiology (e.g., disease incidence counts in susceptible regions), and environmental science (e.g., remotely-sensed satellite imagery). Due to modern data collection methods, the size of these datasets have grown considerably. Spatial generalized linear mixed models (SGLMMs) are a flexible class of models used to model nonstationary and non-Gaussian datasets. Despite their utility, SGLMMs can be computationally prohibitive for even moderately large datasets (e.g., 5000 to 100,000 observed locations). To circumvent this issue, past studies have embedded nested radial basis functions into the SGLMM. However, two crucial specifications (knot placement and bandwidth parameters), which directly affect model performance, are typically fixed prior to model-fitting. We propose a novel approach to model large nonstationary and non-Gaussian spatial datasets using adaptive radial basis functions. Our approach: (1) partitions the spatial domain into subregions; (2) employs reversible-jump Markov chain Monte Carlo (RJMCMC) to infer the number and location of the knots within each partition; and (3) models the latent spatial surface using partition-varying and adaptive basis functions. Through an extensive simulation study, we show that our approach provides more accurate predictions than competing methods while preserving computational efficiency. We demonstrate our approach on two environmental datasets - incidences of plant species and counts of bird species in the United States.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Spatial Statistics GEOSCIENCES, MULTIDISCIPLINARY-MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

CiteScore

4.00

自引率

21.70%

发文量

审稿时长

55 days

期刊介绍： Spatial Statistics publishes articles on the theory and application of spatial and spatio-temporal statistics. It favours manuscripts that present theory generated by new applications, or in which new theory is applied to an important practical case. A purely theoretical study will only rarely be accepted. Pure case studies without methodological development are not acceptable for publication. Spatial statistics concerns the quantitative analysis of spatial and spatio-temporal data, including their statistical dependencies, accuracy and uncertainties. Methodology for spatial statistics is typically found in probability theory, stochastic modelling and mathematical statistics as well as in information science. Spatial statistics is used in mapping, assessing spatial data quality, sampling design optimisation, modelling of dependence structures, and drawing of valid inference from a limited set of spatio-temporal data.