A generalized additive model (GAM) approach to principal component analysis of geographic data

IF 2.1 2区 数学 Q3 GEOSCIENCES, MULTIDISCIPLINARY
Francisco de Asís López , Celestino Ordóñez , Javier Roca-Pardiñas
{"title":"A generalized additive model (GAM) approach to principal component analysis of geographic data","authors":"Francisco de Asís López ,&nbsp;Celestino Ordóñez ,&nbsp;Javier Roca-Pardiñas","doi":"10.1016/j.spasta.2023.100806","DOIUrl":null,"url":null,"abstract":"<div><p>Geographically Weighted Principal Component Analysis (GWPCA) is an extension of classical PCA to deal with the spatial heterogeneity of geographical data. This heterogeneity results in a variance–covariance matrix that is not stationary but changes with the geographical location. Despite its usefulness, this method presents some unsolved issues, such as finding an appropriate bandwidth (size of the vicinity) as a function of the retained components. In this work, we address the problem of calculating principal components for geographical data from a new perspective that overcomes this problem. Specifically we propose a scale-location model which uses generalized additive models (GAMs) to calculate means for each variable and a correlation matrix that relates the variables, both depending on the spatial location. It should be noticed that although we deal with geographic data, our methodology cannot be considered strictly spatial since we assume that there is not a spatial correlation structure in the error term.</p><p>Our approach does not require to calculate an optimal bandwidth as a function of the number of components retained in the analysis. Instead, the covariance matrix is estimated using smooth functions adapted to the data, so the smoothness can be different for each element of the matrix. The proposed methodology was tested with simulated data and compared with GWPCA. The result was a better representation of the data structure in the proposed method. Finally, we show the possibilities of our method in a problem with real data regarding air pollution and socioeconomic factors.</p></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2023-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2211675323000817/pdfft?md5=e258c8c408f56930e791b8a9dc8c5206&pid=1-s2.0-S2211675323000817-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spatial Statistics","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211675323000817","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Geographically Weighted Principal Component Analysis (GWPCA) is an extension of classical PCA to deal with the spatial heterogeneity of geographical data. This heterogeneity results in a variance–covariance matrix that is not stationary but changes with the geographical location. Despite its usefulness, this method presents some unsolved issues, such as finding an appropriate bandwidth (size of the vicinity) as a function of the retained components. In this work, we address the problem of calculating principal components for geographical data from a new perspective that overcomes this problem. Specifically we propose a scale-location model which uses generalized additive models (GAMs) to calculate means for each variable and a correlation matrix that relates the variables, both depending on the spatial location. It should be noticed that although we deal with geographic data, our methodology cannot be considered strictly spatial since we assume that there is not a spatial correlation structure in the error term.

Our approach does not require to calculate an optimal bandwidth as a function of the number of components retained in the analysis. Instead, the covariance matrix is estimated using smooth functions adapted to the data, so the smoothness can be different for each element of the matrix. The proposed methodology was tested with simulated data and compared with GWPCA. The result was a better representation of the data structure in the proposed method. Finally, we show the possibilities of our method in a problem with real data regarding air pollution and socioeconomic factors.

地理数据主成分分析的广义加法模型(GAM)方法
地理加权主成分分析(GWPCA)是经典 PCA 的扩展,用于处理地理数据的空间异质性。这种异质性导致方差-协方差矩阵不是静态的,而是随着地理位置的变化而变化。尽管这种方法非常有用,但它也存在一些尚未解决的问题,例如如何找到一个合适的带宽(邻近区域的大小)作为保留成分的函数。在这项工作中,我们从一个新的角度来解决地理数据的主成分计算问题,从而克服了这个问题。具体来说,我们提出了一种规模-位置模型,该模型使用广义加法模型(GAMs)计算每个变量的均值,以及将变量联系起来的相关矩阵,两者都取决于空间位置。需要注意的是,虽然我们处理的是地理数据,但我们的方法不能被视为严格意义上的空间方法,因为我们假设误差项不存在空间相关结构。相反,协方差矩阵是使用适应数据的平滑函数估算的,因此矩阵中每个元素的平滑度可以不同。我们用模拟数据对所提出的方法进行了测试,并与 GWPCA 进行了比较。结果表明,提议的方法能更好地表示数据结构。最后,我们展示了我们的方法在一个有关空气污染和社会经济因素的真实数据问题中的可能性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Spatial Statistics
Spatial Statistics GEOSCIENCES, MULTIDISCIPLINARY-MATHEMATICS, INTERDISCIPLINARY APPLICATIONS
CiteScore
4.00
自引率
21.70%
发文量
89
审稿时长
55 days
期刊介绍: Spatial Statistics publishes articles on the theory and application of spatial and spatio-temporal statistics. It favours manuscripts that present theory generated by new applications, or in which new theory is applied to an important practical case. A purely theoretical study will only rarely be accepted. Pure case studies without methodological development are not acceptable for publication. Spatial statistics concerns the quantitative analysis of spatial and spatio-temporal data, including their statistical dependencies, accuracy and uncertainties. Methodology for spatial statistics is typically found in probability theory, stochastic modelling and mathematical statistics as well as in information science. Spatial statistics is used in mapping, assessing spatial data quality, sampling design optimisation, modelling of dependence structures, and drawing of valid inference from a limited set of spatio-temporal data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信