Uncertainty assessment based on data decomposition and Boruta-driven extreme gradient boosting to predict spatiotemporal urban air dust heavy metal index

IF 3.5 3区 环境科学与生态学 Q2 ENVIRONMENTAL SCIENCES
Akram Seifi , Somayeh Soltani-Gerdefaramarzi , Mumtaz Ali
{"title":"Uncertainty assessment based on data decomposition and Boruta-driven extreme gradient boosting to predict spatiotemporal urban air dust heavy metal index","authors":"Akram Seifi ,&nbsp;Somayeh Soltani-Gerdefaramarzi ,&nbsp;Mumtaz Ali","doi":"10.1016/j.apr.2025.102654","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate prediction of urban air dust pollutants is essential for public health and environmental management. Achieving reliable predictions of the air pollution due to heavy metals existence in these areas is extremely important. This study for the first time develop an ensemble approach based on multivariate variational model decomposition (MVMD) and extreme gradient boosting (XGBoost) integrated with Bayesian optimizer of Optuna and different feature selection techniques to predict the spatiotemporal distribution of pollution load index (PLI) in Yazd urban area, Iran. For comparison, gated recurrent unit (GRU) network, adaptives neuro-fuzzy-inference system (ANFIS), and multilayer perceptron (MLP) models were are develpoed. Variables including meteorological data, heavy metals concentration of roof dust, and distance to pollution sources were gathered. The seasonal data of variables were analyzed using Boruta feature selection approach (BFSA), SHapley additive explanations (SHAP), and Wavelet methods to identify valuable and easily accessible variables to predict PLI index. The results confirmed that the BFSA has high capability for selecting the most important features over SHAP, and wavelet techniques, that provides cost-effective input vector of Max WD, Min RH, Cd, and Zn with readily available variables. Morover, the XGBoost model shows high prediction accuracy for PLI in terms of R<sup>2</sup> = 0.90, RMSE = 0.08, and MAE = 0.06. Furthermore, by stationarity test of multivariate variational mode decomposition (MVMD) method applied to all input variables, the Max WD and Min RH were decompossed into three intrinsic mode functions (IMFs). These IMFs along with Cd and Zn were used as input vector in the XGBoost to create the final model for predicting temporal uncertainty and generate seasonal urban spatiotemporal maps. The evaluation of uncertainties demonstrated that the MVMD-XGBoost effectively captured 83.33 %, 96.67 %, 63.33 %, and 68.97 % of observed data within the 95 % confidence interval in spring, summer, autumn, and winter seasons, respectively. Findings from this study allow decision-makers to reduce air pollution monitoring costs and enhance control measures by leveraging readily available variables.</div></div>","PeriodicalId":8604,"journal":{"name":"Atmospheric Pollution Research","volume":"16 11","pages":"Article 102654"},"PeriodicalIF":3.5000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Pollution Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1309104225002569","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Accurate prediction of urban air dust pollutants is essential for public health and environmental management. Achieving reliable predictions of the air pollution due to heavy metals existence in these areas is extremely important. This study for the first time develop an ensemble approach based on multivariate variational model decomposition (MVMD) and extreme gradient boosting (XGBoost) integrated with Bayesian optimizer of Optuna and different feature selection techniques to predict the spatiotemporal distribution of pollution load index (PLI) in Yazd urban area, Iran. For comparison, gated recurrent unit (GRU) network, adaptives neuro-fuzzy-inference system (ANFIS), and multilayer perceptron (MLP) models were are develpoed. Variables including meteorological data, heavy metals concentration of roof dust, and distance to pollution sources were gathered. The seasonal data of variables were analyzed using Boruta feature selection approach (BFSA), SHapley additive explanations (SHAP), and Wavelet methods to identify valuable and easily accessible variables to predict PLI index. The results confirmed that the BFSA has high capability for selecting the most important features over SHAP, and wavelet techniques, that provides cost-effective input vector of Max WD, Min RH, Cd, and Zn with readily available variables. Morover, the XGBoost model shows high prediction accuracy for PLI in terms of R2 = 0.90, RMSE = 0.08, and MAE = 0.06. Furthermore, by stationarity test of multivariate variational mode decomposition (MVMD) method applied to all input variables, the Max WD and Min RH were decompossed into three intrinsic mode functions (IMFs). These IMFs along with Cd and Zn were used as input vector in the XGBoost to create the final model for predicting temporal uncertainty and generate seasonal urban spatiotemporal maps. The evaluation of uncertainties demonstrated that the MVMD-XGBoost effectively captured 83.33 %, 96.67 %, 63.33 %, and 68.97 % of observed data within the 95 % confidence interval in spring, summer, autumn, and winter seasons, respectively. Findings from this study allow decision-makers to reduce air pollution monitoring costs and enhance control measures by leveraging readily available variables.

Abstract Image

基于数据分解和boruta驱动的极值梯度提升的不确定性评价预测时空城市大气扬尘重金属指数
城市大气粉尘污染物的准确预测对公共健康和环境管理至关重要。对这些地区存在的重金属造成的空气污染进行可靠的预测是极其重要的。本研究首次提出了基于多变量变分模型分解(MVMD)和极端梯度提升(XGBoost)的集成方法,结合Optuna的贝叶斯优化器和不同的特征选择技术来预测伊朗亚兹德城区污染负荷指数(PLI)的时空分布。为了进行比较,开发了门控循环单元(GRU)网络、自适应神经模糊推理系统(ANFIS)和多层感知器(MLP)模型。收集了气象数据、屋顶粉尘重金属浓度和污染源距离等变量。采用Boruta特征选择法(BFSA)、SHapley加性解释法(SHAP)和小波分析方法对变量的季节数据进行分析,以识别有价值且易于获取的变量来预测PLI指数。结果证实,相对于SHAP和小波技术,BFSA在选择最重要特征方面具有很高的能力,可以提供具有可用变量的成本效益高的Max WD, Min RH, Cd和Zn输入向量。此外,XGBoost模型对PLI的预测精度较高,R2 = 0.90, RMSE = 0.08, MAE = 0.06。此外,通过对所有输入变量进行多元变分模态分解(MVMD)方法的平稳性检验,将最大WD和最小RH分解为三个本征模态函数(IMFs)。这些imf与Cd和Zn一起被用作XGBoost的输入向量,以创建预测时间不确定性的最终模型,并生成季节性城市时空地图。不确定性评估结果表明,MVMD-XGBoost在春、夏、秋、冬4个季节的95%置信区间内有效捕获了83.33%、96.67%、63.33%和68.97%的观测数据。这项研究的结果使决策者能够通过利用现成的变量来降低空气污染监测成本并加强控制措施。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Atmospheric Pollution Research
Atmospheric Pollution Research ENVIRONMENTAL SCIENCES-
CiteScore
8.30
自引率
6.70%
发文量
256
审稿时长
36 days
期刊介绍: Atmospheric Pollution Research (APR) is an international journal designed for the publication of articles on air pollution. Papers should present novel experimental results, theory and modeling of air pollution on local, regional, or global scales. Areas covered are research on inorganic, organic, and persistent organic air pollutants, air quality monitoring, air quality management, atmospheric dispersion and transport, air-surface (soil, water, and vegetation) exchange of pollutants, dry and wet deposition, indoor air quality, exposure assessment, health effects, satellite measurements, natural emissions, atmospheric chemistry, greenhouse gases, and effects on climate change.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信