A gridded air quality forecast through fusing site-available machine learning predictions from RFSML v1.0 and chemical transport model results from GEOS-Chem v13.1.0 using the ensemble Kalman filter

IF 4 3区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Geoscientific Model Development Pub Date : 2023-08-29 DOI:10.5194/gmd-16-4867-2023

Li Fang, Jianbing Jin, A. Segers, H. Liao, Ke Li, Bufan Xu, Wei Han, Mijie Pang, H. Lin

{"title":"A gridded air quality forecast through fusing site-available machine learning predictions from RFSML v1.0 and chemical transport model results from GEOS-Chem v13.1.0 using the ensemble Kalman filter","authors":"Li Fang, Jianbing Jin, A. Segers, H. Liao, Ke Li, Bufan Xu, Wei Han, Mijie Pang, H. Lin","doi":"10.5194/gmd-16-4867-2023","DOIUrl":null,"url":null,"abstract":"Abstract. Statistical methods, particularly machine learning models, have gained significant popularity in air quality predictions. These prediction models are commonly trained using the historical measurement datasets independently collected at the environmental monitoring stations and their operational forecasts in advance using inputs of the real-time ambient pollutant observations. Therefore, these high-quality machine learning models only provide site-available predictions and cannot solely be used as the operational forecast. In contrast, deterministic chemical transport models (CTMs), which simulate the full life cycles of air pollutants, provide predictions that are continuous in the 3D field. Despite their benefits, CTM predictions are typically biased, particularly on a fine scale, owing to the complex error sources due to the emission, transport, and removal of pollutants. In this study, we proposed a fusion of site-available machine learning prediction, which is from our regional feature selection-based machine learning model (RFSML v1.0), and a CTM prediction. Compared to the normal pure machine learning model, the fusion system provides a gridded prediction with relatively high accuracy. The prediction fusion was conducted using the Bayesian-theory-based ensemble Kalman filter (EnKF). Background error covariance was an essential part in the assimilation process. Ensemble CTM predictions driven by the perturbed emission inventories were initially used for representing their spatial covariance statistics, which could resolve the main part of the CTM error. In addition, a covariance inflation algorithm was designed to amplify the ensemble perturbations to account for other model errors next to the uncertainty in emission inputs. Model evaluation tests were conducted based on independent measurements. Our EnKF-based prediction fusion presented superior performance compared to the pure CTM. Moreover, covariance inflation further enhanced the fused prediction, particularly in cases of severe underestimation.\n","PeriodicalId":12799,"journal":{"name":"Geoscientific Model Development","volume":" ","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoscientific Model Development","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.5194/gmd-16-4867-2023","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract. Statistical methods, particularly machine learning models, have gained significant popularity in air quality predictions. These prediction models are commonly trained using the historical measurement datasets independently collected at the environmental monitoring stations and their operational forecasts in advance using inputs of the real-time ambient pollutant observations. Therefore, these high-quality machine learning models only provide site-available predictions and cannot solely be used as the operational forecast. In contrast, deterministic chemical transport models (CTMs), which simulate the full life cycles of air pollutants, provide predictions that are continuous in the 3D field. Despite their benefits, CTM predictions are typically biased, particularly on a fine scale, owing to the complex error sources due to the emission, transport, and removal of pollutants. In this study, we proposed a fusion of site-available machine learning prediction, which is from our regional feature selection-based machine learning model (RFSML v1.0), and a CTM prediction. Compared to the normal pure machine learning model, the fusion system provides a gridded prediction with relatively high accuracy. The prediction fusion was conducted using the Bayesian-theory-based ensemble Kalman filter (EnKF). Background error covariance was an essential part in the assimilation process. Ensemble CTM predictions driven by the perturbed emission inventories were initially used for representing their spatial covariance statistics, which could resolve the main part of the CTM error. In addition, a covariance inflation algorithm was designed to amplify the ensemble perturbations to account for other model errors next to the uncertainty in emission inputs. Model evaluation tests were conducted based on independent measurements. Our EnKF-based prediction fusion presented superior performance compared to the pure CTM. Moreover, covariance inflation further enhanced the fused prediction, particularly in cases of severe underestimation.

查看原文本刊更多论文

通过融合RFSML v1.0的现场可用机器学习预测和GEOS-Chem v13.1.0的化学传输模型结果，使用集合卡尔曼滤波器进行网格化空气质量预测

摘要统计方法，特别是机器学习模型，在空气质量预测中越来越受欢迎。这些预测模型通常使用在环境监测站独立收集的历史测量数据集进行训练，并使用实时环境污染物观测的输入预先进行运行预测。因此，这些高质量的机器学习模型只提供现场可用的预测，不能单独用作操作预测。相比之下，模拟空气污染物全生命周期的确定性化学迁移模型（CTM）在3D领域提供了连续的预测。尽管CTM有好处，但由于污染物的排放、运输和去除造成的复杂误差源，其预测通常是有偏差的，特别是在精细尺度上。在本研究中，我们提出了一种站点可用机器学习预测与CTM预测的融合，该预测来自我们的基于区域特征选择的机器学习模型（RFSML v1.0）。与普通的纯机器学习模型相比，融合系统提供了相对高精度的网格预测。使用基于贝叶斯理论的集成卡尔曼滤波器（EnKF）进行预测融合。背景误差协方差是同化过程中的重要组成部分。由扰动排放清单驱动的集合CTM预测最初用于表示其空间协方差统计，这可以解决CTM误差的主要部分。此外，还设计了一种协方差膨胀算法来放大系综扰动，以考虑排放输入中不确定性旁边的其他模型误差。模型评估测试是在独立测量的基础上进行的。与纯CTM相比，我们基于EnKF的预测融合表现出优越的性能。此外，协方差膨胀进一步增强了融合预测，特别是在严重低估的情况下。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Geoscientific Model Development GEOSCIENCES, MULTIDISCIPLINARY-

CiteScore

8.60

自引率

9.80%

发文量

352

审稿时长

6-12 weeks

期刊介绍： Geoscientific Model Development (GMD) is an international scientific journal dedicated to the publication and public discussion of the description, development, and evaluation of numerical models of the Earth system and its components. The following manuscript types can be considered for peer-reviewed publication: * geoscientific model descriptions, from statistical models to box models to GCMs; * development and technical papers, describing developments such as new parameterizations or technical aspects of running models such as the reproducibility of results; * new methods for assessment of models, including work on developing new metrics for assessing model performance and novel ways of comparing model results with observational data; * papers describing new standard experiments for assessing model performance or novel ways of comparing model results with observational data; * model experiment descriptions, including experimental details and project protocols; * full evaluations of previously published models.