{"title":"带日周期限制的时空堆叠法重建缺失的 PM2.5 小时记录","authors":"Chuanfa Chen, Kunyu Li","doi":"10.1111/tgis.13141","DOIUrl":null,"url":null,"abstract":"The reliability of hourly PM2.5 data obtained from air quality monitoring stations is compromised as a result of the missing values, thereby impeding the thorough examination of crucial information. In this paper, we present a spatiotemporal (ST) stacking machine learning (ML) method with daily-cycle restrictions for reconstructing missing hourly PM2.5 records. First, the ST neighbors for the target station with missing values are selected at a daily scale. Subsequently, the non-null data within the ST neighbors undergo an iterative P-BSHADE interpolation process for re-interpolation. Next, a stacking ML model is constructed using the re-interpolation values and several environmental factors associated with PM2.5 as the predictors, while the observed PM2.5 is taken as the independent variable. Finally, the missing values are reconstructed by inputting the predictors into the trained stacking model. The study utilized hourly PM2.5 data in the Beijing-Tianjin-Hebei region as a case study to assess the effectiveness of the proposed method, using daily missing ratios of 10%, 30%, and 50%, respectively. The accuracy of the proposed method was then compared to four contemporary ST interpolation methods. The results indicate that the proposed method exhibits superior performance compared to the classical methods. Specifically, it achieves a reduction in the average root mean square error and mean absolute error by at least 40.6% and 40.1%, respectively. Additionally, the proposed method demonstrates the successful recovery of extreme values in the hourly PM2.5 records, in contrast to the classical methods which often exhibit a tendency to overestimate low values and underestimate high values. Overall, the proposed method presents a viable and efficient approach to recover missing values in the hourly PM2.5 records that demonstrate evident daily periodic patterns.","PeriodicalId":47842,"journal":{"name":"Transactions in GIS","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spatiotemporal stacking method with daily-cycle restrictions for reconstructing missing hourly PM2.5 records\",\"authors\":\"Chuanfa Chen, Kunyu Li\",\"doi\":\"10.1111/tgis.13141\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The reliability of hourly PM2.5 data obtained from air quality monitoring stations is compromised as a result of the missing values, thereby impeding the thorough examination of crucial information. In this paper, we present a spatiotemporal (ST) stacking machine learning (ML) method with daily-cycle restrictions for reconstructing missing hourly PM2.5 records. First, the ST neighbors for the target station with missing values are selected at a daily scale. Subsequently, the non-null data within the ST neighbors undergo an iterative P-BSHADE interpolation process for re-interpolation. Next, a stacking ML model is constructed using the re-interpolation values and several environmental factors associated with PM2.5 as the predictors, while the observed PM2.5 is taken as the independent variable. Finally, the missing values are reconstructed by inputting the predictors into the trained stacking model. The study utilized hourly PM2.5 data in the Beijing-Tianjin-Hebei region as a case study to assess the effectiveness of the proposed method, using daily missing ratios of 10%, 30%, and 50%, respectively. The accuracy of the proposed method was then compared to four contemporary ST interpolation methods. The results indicate that the proposed method exhibits superior performance compared to the classical methods. Specifically, it achieves a reduction in the average root mean square error and mean absolute error by at least 40.6% and 40.1%, respectively. Additionally, the proposed method demonstrates the successful recovery of extreme values in the hourly PM2.5 records, in contrast to the classical methods which often exhibit a tendency to overestimate low values and underestimate high values. Overall, the proposed method presents a viable and efficient approach to recover missing values in the hourly PM2.5 records that demonstrate evident daily periodic patterns.\",\"PeriodicalId\":47842,\"journal\":{\"name\":\"Transactions in GIS\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-02-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transactions in GIS\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.1111/tgis.13141\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GEOGRAPHY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions in GIS","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1111/tgis.13141","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOGRAPHY","Score":null,"Total":0}
引用次数: 0
摘要
从空气质量监测站获得的每小时 PM2.5 数据由于存在缺失值,其可靠性大打折扣,从而阻碍了对关键信息的全面研究。本文提出了一种具有日周期限制的时空(ST)堆叠机器学习(ML)方法,用于重建缺失的 PM2.5 小时记录。首先,以日为尺度选择有缺失值的目标站的 ST 邻居。随后,对 ST 邻域内的非空数据进行迭代 P-BSHADE 插值,以重新插值。然后,使用重新插值和与 PM2.5 相关的几个环境因素作为预测因子,同时将观测到的 PM2.5 作为自变量,构建堆叠 ML 模型。最后,通过将预测值输入训练有素的堆叠模型来重建缺失值。研究利用京津冀地区每小时的 PM2.5 数据作为案例,分别使用 10%、30% 和 50%的日缺失率来评估建议方法的有效性。然后,将所提方法的准确性与四种当代 ST 插值方法进行了比较。结果表明,与传统方法相比,建议的方法表现出更优越的性能。具体来说,它将平均均方根误差和平均绝对误差分别降低了至少 40.6% 和 40.1%。此外,提议的方法成功地恢复了每小时 PM2.5 记录中的极端值,而传统方法往往表现出高估低值和低估高值的倾向。总之,建议的方法是恢复 PM2.5 小时记录中缺失值的一种可行而有效的方法,这些记录显示出明显的日周期模式。
Spatiotemporal stacking method with daily-cycle restrictions for reconstructing missing hourly PM2.5 records
The reliability of hourly PM2.5 data obtained from air quality monitoring stations is compromised as a result of the missing values, thereby impeding the thorough examination of crucial information. In this paper, we present a spatiotemporal (ST) stacking machine learning (ML) method with daily-cycle restrictions for reconstructing missing hourly PM2.5 records. First, the ST neighbors for the target station with missing values are selected at a daily scale. Subsequently, the non-null data within the ST neighbors undergo an iterative P-BSHADE interpolation process for re-interpolation. Next, a stacking ML model is constructed using the re-interpolation values and several environmental factors associated with PM2.5 as the predictors, while the observed PM2.5 is taken as the independent variable. Finally, the missing values are reconstructed by inputting the predictors into the trained stacking model. The study utilized hourly PM2.5 data in the Beijing-Tianjin-Hebei region as a case study to assess the effectiveness of the proposed method, using daily missing ratios of 10%, 30%, and 50%, respectively. The accuracy of the proposed method was then compared to four contemporary ST interpolation methods. The results indicate that the proposed method exhibits superior performance compared to the classical methods. Specifically, it achieves a reduction in the average root mean square error and mean absolute error by at least 40.6% and 40.1%, respectively. Additionally, the proposed method demonstrates the successful recovery of extreme values in the hourly PM2.5 records, in contrast to the classical methods which often exhibit a tendency to overestimate low values and underestimate high values. Overall, the proposed method presents a viable and efficient approach to recover missing values in the hourly PM2.5 records that demonstrate evident daily periodic patterns.
期刊介绍:
Transactions in GIS is an international journal which provides a forum for high quality, original research articles, review articles, short notes and book reviews that focus on: - practical and theoretical issues influencing the development of GIS - the collection, analysis, modelling, interpretation and display of spatial data within GIS - the connections between GIS and related technologies - new GIS applications which help to solve problems affecting the natural or built environments, or business