Akram Seifi , Somayeh Soltani-Gerdefaramarzi , Mumtaz Ali
{"title":"基于数据分解和boruta驱动的极值梯度提升的不确定性评价预测时空城市大气扬尘重金属指数","authors":"Akram Seifi , Somayeh Soltani-Gerdefaramarzi , Mumtaz Ali","doi":"10.1016/j.apr.2025.102654","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate prediction of urban air dust pollutants is essential for public health and environmental management. Achieving reliable predictions of the air pollution due to heavy metals existence in these areas is extremely important. This study for the first time develop an ensemble approach based on multivariate variational model decomposition (MVMD) and extreme gradient boosting (XGBoost) integrated with Bayesian optimizer of Optuna and different feature selection techniques to predict the spatiotemporal distribution of pollution load index (PLI) in Yazd urban area, Iran. For comparison, gated recurrent unit (GRU) network, adaptives neuro-fuzzy-inference system (ANFIS), and multilayer perceptron (MLP) models were are develpoed. Variables including meteorological data, heavy metals concentration of roof dust, and distance to pollution sources were gathered. The seasonal data of variables were analyzed using Boruta feature selection approach (BFSA), SHapley additive explanations (SHAP), and Wavelet methods to identify valuable and easily accessible variables to predict PLI index. The results confirmed that the BFSA has high capability for selecting the most important features over SHAP, and wavelet techniques, that provides cost-effective input vector of Max WD, Min RH, Cd, and Zn with readily available variables. Morover, the XGBoost model shows high prediction accuracy for PLI in terms of R<sup>2</sup> = 0.90, RMSE = 0.08, and MAE = 0.06. Furthermore, by stationarity test of multivariate variational mode decomposition (MVMD) method applied to all input variables, the Max WD and Min RH were decompossed into three intrinsic mode functions (IMFs). These IMFs along with Cd and Zn were used as input vector in the XGBoost to create the final model for predicting temporal uncertainty and generate seasonal urban spatiotemporal maps. The evaluation of uncertainties demonstrated that the MVMD-XGBoost effectively captured 83.33 %, 96.67 %, 63.33 %, and 68.97 % of observed data within the 95 % confidence interval in spring, summer, autumn, and winter seasons, respectively. Findings from this study allow decision-makers to reduce air pollution monitoring costs and enhance control measures by leveraging readily available variables.</div></div>","PeriodicalId":8604,"journal":{"name":"Atmospheric Pollution Research","volume":"16 11","pages":"Article 102654"},"PeriodicalIF":3.5000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Uncertainty assessment based on data decomposition and Boruta-driven extreme gradient boosting to predict spatiotemporal urban air dust heavy metal index\",\"authors\":\"Akram Seifi , Somayeh Soltani-Gerdefaramarzi , Mumtaz Ali\",\"doi\":\"10.1016/j.apr.2025.102654\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate prediction of urban air dust pollutants is essential for public health and environmental management. Achieving reliable predictions of the air pollution due to heavy metals existence in these areas is extremely important. This study for the first time develop an ensemble approach based on multivariate variational model decomposition (MVMD) and extreme gradient boosting (XGBoost) integrated with Bayesian optimizer of Optuna and different feature selection techniques to predict the spatiotemporal distribution of pollution load index (PLI) in Yazd urban area, Iran. For comparison, gated recurrent unit (GRU) network, adaptives neuro-fuzzy-inference system (ANFIS), and multilayer perceptron (MLP) models were are develpoed. Variables including meteorological data, heavy metals concentration of roof dust, and distance to pollution sources were gathered. The seasonal data of variables were analyzed using Boruta feature selection approach (BFSA), SHapley additive explanations (SHAP), and Wavelet methods to identify valuable and easily accessible variables to predict PLI index. The results confirmed that the BFSA has high capability for selecting the most important features over SHAP, and wavelet techniques, that provides cost-effective input vector of Max WD, Min RH, Cd, and Zn with readily available variables. Morover, the XGBoost model shows high prediction accuracy for PLI in terms of R<sup>2</sup> = 0.90, RMSE = 0.08, and MAE = 0.06. Furthermore, by stationarity test of multivariate variational mode decomposition (MVMD) method applied to all input variables, the Max WD and Min RH were decompossed into three intrinsic mode functions (IMFs). These IMFs along with Cd and Zn were used as input vector in the XGBoost to create the final model for predicting temporal uncertainty and generate seasonal urban spatiotemporal maps. The evaluation of uncertainties demonstrated that the MVMD-XGBoost effectively captured 83.33 %, 96.67 %, 63.33 %, and 68.97 % of observed data within the 95 % confidence interval in spring, summer, autumn, and winter seasons, respectively. Findings from this study allow decision-makers to reduce air pollution monitoring costs and enhance control measures by leveraging readily available variables.</div></div>\",\"PeriodicalId\":8604,\"journal\":{\"name\":\"Atmospheric Pollution Research\",\"volume\":\"16 11\",\"pages\":\"Article 102654\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Atmospheric Pollution Research\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1309104225002569\",\"RegionNum\":3,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Pollution Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1309104225002569","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
摘要
城市大气粉尘污染物的准确预测对公共健康和环境管理至关重要。对这些地区存在的重金属造成的空气污染进行可靠的预测是极其重要的。本研究首次提出了基于多变量变分模型分解(MVMD)和极端梯度提升(XGBoost)的集成方法,结合Optuna的贝叶斯优化器和不同的特征选择技术来预测伊朗亚兹德城区污染负荷指数(PLI)的时空分布。为了进行比较,开发了门控循环单元(GRU)网络、自适应神经模糊推理系统(ANFIS)和多层感知器(MLP)模型。收集了气象数据、屋顶粉尘重金属浓度和污染源距离等变量。采用Boruta特征选择法(BFSA)、SHapley加性解释法(SHAP)和小波分析方法对变量的季节数据进行分析,以识别有价值且易于获取的变量来预测PLI指数。结果证实,相对于SHAP和小波技术,BFSA在选择最重要特征方面具有很高的能力,可以提供具有可用变量的成本效益高的Max WD, Min RH, Cd和Zn输入向量。此外,XGBoost模型对PLI的预测精度较高,R2 = 0.90, RMSE = 0.08, MAE = 0.06。此外,通过对所有输入变量进行多元变分模态分解(MVMD)方法的平稳性检验,将最大WD和最小RH分解为三个本征模态函数(IMFs)。这些imf与Cd和Zn一起被用作XGBoost的输入向量,以创建预测时间不确定性的最终模型,并生成季节性城市时空地图。不确定性评估结果表明,MVMD-XGBoost在春、夏、秋、冬4个季节的95%置信区间内有效捕获了83.33%、96.67%、63.33%和68.97%的观测数据。这项研究的结果使决策者能够通过利用现成的变量来降低空气污染监测成本并加强控制措施。
Uncertainty assessment based on data decomposition and Boruta-driven extreme gradient boosting to predict spatiotemporal urban air dust heavy metal index
Accurate prediction of urban air dust pollutants is essential for public health and environmental management. Achieving reliable predictions of the air pollution due to heavy metals existence in these areas is extremely important. This study for the first time develop an ensemble approach based on multivariate variational model decomposition (MVMD) and extreme gradient boosting (XGBoost) integrated with Bayesian optimizer of Optuna and different feature selection techniques to predict the spatiotemporal distribution of pollution load index (PLI) in Yazd urban area, Iran. For comparison, gated recurrent unit (GRU) network, adaptives neuro-fuzzy-inference system (ANFIS), and multilayer perceptron (MLP) models were are develpoed. Variables including meteorological data, heavy metals concentration of roof dust, and distance to pollution sources were gathered. The seasonal data of variables were analyzed using Boruta feature selection approach (BFSA), SHapley additive explanations (SHAP), and Wavelet methods to identify valuable and easily accessible variables to predict PLI index. The results confirmed that the BFSA has high capability for selecting the most important features over SHAP, and wavelet techniques, that provides cost-effective input vector of Max WD, Min RH, Cd, and Zn with readily available variables. Morover, the XGBoost model shows high prediction accuracy for PLI in terms of R2 = 0.90, RMSE = 0.08, and MAE = 0.06. Furthermore, by stationarity test of multivariate variational mode decomposition (MVMD) method applied to all input variables, the Max WD and Min RH were decompossed into three intrinsic mode functions (IMFs). These IMFs along with Cd and Zn were used as input vector in the XGBoost to create the final model for predicting temporal uncertainty and generate seasonal urban spatiotemporal maps. The evaluation of uncertainties demonstrated that the MVMD-XGBoost effectively captured 83.33 %, 96.67 %, 63.33 %, and 68.97 % of observed data within the 95 % confidence interval in spring, summer, autumn, and winter seasons, respectively. Findings from this study allow decision-makers to reduce air pollution monitoring costs and enhance control measures by leveraging readily available variables.
期刊介绍:
Atmospheric Pollution Research (APR) is an international journal designed for the publication of articles on air pollution. Papers should present novel experimental results, theory and modeling of air pollution on local, regional, or global scales. Areas covered are research on inorganic, organic, and persistent organic air pollutants, air quality monitoring, air quality management, atmospheric dispersion and transport, air-surface (soil, water, and vegetation) exchange of pollutants, dry and wet deposition, indoor air quality, exposure assessment, health effects, satellite measurements, natural emissions, atmospheric chemistry, greenhouse gases, and effects on climate change.