High-Resolution Spatiotemporal Forecasting with Missing Observations Including an Application to Daily Particulate Matter 2.5 Concentrations in Jakarta Province, Indonesia

IF 2.2 3区数学 Q1 MATHEMATICS

Mathematics Pub Date : 2024-09-17 DOI:10.3390/math12182899

I Gede Nyoman Mindra Jaya, Henk Folmer

{"title":"High-Resolution Spatiotemporal Forecasting with Missing Observations Including an Application to Daily Particulate Matter 2.5 Concentrations in Jakarta Province, Indonesia","authors":"I Gede Nyoman Mindra Jaya, Henk Folmer","doi":"10.3390/math12182899","DOIUrl":null,"url":null,"abstract":"Accurate forecasting of high-resolution particulate matter 2.5 (PM2.5) levels is essential for the development of public health policy. However, datasets used for this purpose often contain missing observations. This study presents a two-stage approach to handle this problem. The first stage is a multivariate spatial time series (MSTS) model, used to generate forecasts for the sampled spatial units and to impute missing observations. The MSTS model utilizes the similarities between the temporal patterns of the time series of the spatial units to impute the missing data across space. The second stage is the high-resolution prediction model, which generates predictions that cover the entire study domain. The second stage faces the big N problem giving rise to complex memory and computational problems. As a solution to the big N problem, we propose a Gaussian Markov random field (GMRF) for innovations with the Matérn covariance matrix obtained from the corresponding Gaussian field (GF) matrix by means of the stochastic partial differential equation (SPDE) method and the finite element method (FEM). For inference, we propose Bayesian statistics and integrated nested Laplace approximation (INLA) in the R-INLA package. The above approach is demonstrated using daily data collected from 13 PM2.5 monitoring stations in Jakarta Province, Indonesia, for 1 January–31 December 2022. The first stage of the model generates PM2.5 forecasts for the 13 monitoring stations for the period 1–31 January 2023, imputing missing data by means of the MSTS model. To capture temporal trends in the PM2.5 concentrations, the model applies a first-order autoregressive process and a seasonal process. The second stage involves creating a high-resolution map for the period 1–31 January 2023, for sampled and non-sampled spatiotemporal units. It uses the MSTS-generated PM2.5 predictions for the sampled spatiotemporal units and observations of the covariate’s altitude, population density, and rainfall for sampled and non-samples spatiotemporal units. For the spatially correlated random effects, we apply a first-order random walk process. The validation of out-of-sample forecasts indicates a strong model fit with low mean squared error (0.001), mean absolute error (0.037), and mean absolute percentage error (0.041), and a high R² value (0.855). The analysis reveals that altitude and precipitation negatively impact PM2.5 concentrations, while population density has a positive effect. Specifically, a one-meter increase in altitude is linked to a 7.8% decrease in PM2.5, while a one-person increase in population density leads to a 7.0% rise in PM2.5. Additionally, a one-millimeter increase in rainfall corresponds to a 3.9% decrease in PM2.5. The paper makes a valuable contribution to the field of forecasting high-resolution PM2.5 levels, which is essential for providing detailed, accurate information for public health policy. The approach presents a new and innovative method for addressing the problem of missing data and high-resolution forecasting.","PeriodicalId":18303,"journal":{"name":"Mathematics","volume":"1 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.3390/math12182899","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate forecasting of high-resolution particulate matter 2.5 (PM2.5) levels is essential for the development of public health policy. However, datasets used for this purpose often contain missing observations. This study presents a two-stage approach to handle this problem. The first stage is a multivariate spatial time series (MSTS) model, used to generate forecasts for the sampled spatial units and to impute missing observations. The MSTS model utilizes the similarities between the temporal patterns of the time series of the spatial units to impute the missing data across space. The second stage is the high-resolution prediction model, which generates predictions that cover the entire study domain. The second stage faces the big N problem giving rise to complex memory and computational problems. As a solution to the big N problem, we propose a Gaussian Markov random field (GMRF) for innovations with the Matérn covariance matrix obtained from the corresponding Gaussian field (GF) matrix by means of the stochastic partial differential equation (SPDE) method and the finite element method (FEM). For inference, we propose Bayesian statistics and integrated nested Laplace approximation (INLA) in the R-INLA package. The above approach is demonstrated using daily data collected from 13 PM2.5 monitoring stations in Jakarta Province, Indonesia, for 1 January–31 December 2022. The first stage of the model generates PM2.5 forecasts for the 13 monitoring stations for the period 1–31 January 2023, imputing missing data by means of the MSTS model. To capture temporal trends in the PM2.5 concentrations, the model applies a first-order autoregressive process and a seasonal process. The second stage involves creating a high-resolution map for the period 1–31 January 2023, for sampled and non-sampled spatiotemporal units. It uses the MSTS-generated PM2.5 predictions for the sampled spatiotemporal units and observations of the covariate’s altitude, population density, and rainfall for sampled and non-samples spatiotemporal units. For the spatially correlated random effects, we apply a first-order random walk process. The validation of out-of-sample forecasts indicates a strong model fit with low mean squared error (0.001), mean absolute error (0.037), and mean absolute percentage error (0.041), and a high R² value (0.855). The analysis reveals that altitude and precipitation negatively impact PM2.5 concentrations, while population density has a positive effect. Specifically, a one-meter increase in altitude is linked to a 7.8% decrease in PM2.5, while a one-person increase in population density leads to a 7.0% rise in PM2.5. Additionally, a one-millimeter increase in rainfall corresponds to a 3.9% decrease in PM2.5. The paper makes a valuable contribution to the field of forecasting high-resolution PM2.5 levels, which is essential for providing detailed, accurate information for public health policy. The approach presents a new and innovative method for addressing the problem of missing data and high-resolution forecasting.

查看原文本刊更多论文

利用缺失观测数据进行高分辨率时空预测，包括对印度尼西亚雅加达省颗粒物 2.5 每日浓度的应用

高分辨率颗粒物 2.5（PM2.5）水平的准确预测对于公共卫生政策的制定至关重要。然而，用于这一目的的数据集往往包含缺失的观测数据。本研究提出了一种分两个阶段处理这一问题的方法。第一阶段是一个多变量空间时间序列（MSTS）模型，用于生成对采样空间单位的预测并弥补缺失的观测数据。多变量空间时间序列模型利用空间单位时间序列的时间模式之间的相似性来弥补整个空间的缺失数据。第二阶段是高分辨率预测模型，生成覆盖整个研究领域的预测结果。第二阶段面临着大 N 问题，会带来复杂的内存和计算问题。作为大 N 问题的解决方案，我们提出了一种高斯马尔可夫随机场（GMRF），通过随机偏微分方程（SPDE）方法和有限元方法（FEM），从相应的高斯场（GF）矩阵中获得创新的马特恩协方差矩阵。在推理方面，我们提出了贝叶斯统计法和 R-INLA 软件包中的嵌套拉普拉斯近似法（INLA）。我们利用从印度尼西亚雅加达省 13 个 PM2.5 监测站收集到的 2022 年 1 月 1 日至 12 月 31 日的每日数据对上述方法进行了演示。模型的第一阶段通过 MSTS 模型对缺失数据进行归类，生成 13 个监测站 2023 年 1 月 1-31 日的 PM2.5 预测值。为了捕捉 PM2.5 浓度的时间趋势，模型采用了一阶自回归过程和季节过程。第二阶段包括为采样和非采样时空单位创建 2023 年 1 月 1-31 日期间的高分辨率地图。它使用 MSTS 为采样时空单元生成的 PM2.5 预测值，以及对采样和非采样时空单元的协变量海拔高度、人口密度和降雨量的观测值。对于空间相关随机效应，我们采用一阶随机游走过程。样本外预报的验证结果表明，模型拟合度很高，平均平方误差（0.001）、平均绝对误差（0.037）和平均绝对百分比误差（0.041）都很低，R²值也很高（0.855）。分析表明，海拔高度和降水对 PM2.5 浓度有负面影响，而人口密度则有正面影响。具体来说，海拔高度每增加一米，PM2.5 就会下降 7.8%，而人口密度每增加一人，PM2.5 就会上升 7.0%。此外，降雨量每增加一毫米，PM2.5就会减少3.9%。该论文为高分辨率 PM2.5 水平预报领域做出了宝贵贡献，这对于为公共卫生政策提供详细、准确的信息至关重要。该方法为解决数据缺失和高分辨率预测问题提供了一种新的创新方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Mathematics Mathematics-General Mathematics

CiteScore

4.00

自引率

16.70%

发文量

4032

审稿时长

21.9 days

期刊介绍： Mathematics (ISSN 2227-7390) is an international, open access journal which provides an advanced forum for studies related to mathematical sciences. It devotes exclusively to the publication of high-quality reviews, regular research papers and short communications in all areas of pure and applied mathematics. Mathematics also publishes timely and thorough survey articles on current trends, new theoretical techniques, novel ideas and new mathematical tools in different branches of mathematics.