Air pollution prediction with advanced preprocessing and deep ensemble learning

IF 3.9 3区 环境科学与生态学 Q2 ENVIRONMENTAL SCIENCES
Gaurav Narkhede , Mustafa Poonawala , Atharva Sonawane , Anil Hiwale , Arvind R. Singh
{"title":"Air pollution prediction with advanced preprocessing and deep ensemble learning","authors":"Gaurav Narkhede ,&nbsp;Mustafa Poonawala ,&nbsp;Atharva Sonawane ,&nbsp;Anil Hiwale ,&nbsp;Arvind R. Singh","doi":"10.1016/j.apr.2025.102610","DOIUrl":null,"url":null,"abstract":"<div><div>The research presented investigates the impact of model selection and preprocessing techniques on air pollution prediction performance, particularly pertaining to achieving Sustainable Development Goals (SDGs). Accurate training of predictive models necessitates effective handling of missing or null values in environmental datasets. To address this challenge, Probabilistic Principal Component Analysis (PPCA) and the Extra Tree Regressor for data imputation are employed, followed by scaling using Robust Scaler, Min-Max Scaler, and Standard Scaler. A thorough comparison of these preprocessing methods revealed that PPCA is the most suitable choice for imputing missing values in air quality datasets, while the Robust Scaler provided the most reliable and accurate scaling. Additionally, Stochastic Gradient Descent (SGD) is integrated as an optimization technique to enhance model performance. The Weighted Average ensemble method, combining PPCA imputation and Robust Scaler, demonstrated superior predictive capabilities. This research highlights the potential for further improvements through additional ensemble techniques and model optimization strategies, opening avenues for future research focused on improving prediction precision and advancing the achievement of Sustainable Development Goals linked to environmental sustainability.</div></div>","PeriodicalId":8604,"journal":{"name":"Atmospheric Pollution Research","volume":"16 10","pages":"Article 102610"},"PeriodicalIF":3.9000,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Pollution Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1309104225002120","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

The research presented investigates the impact of model selection and preprocessing techniques on air pollution prediction performance, particularly pertaining to achieving Sustainable Development Goals (SDGs). Accurate training of predictive models necessitates effective handling of missing or null values in environmental datasets. To address this challenge, Probabilistic Principal Component Analysis (PPCA) and the Extra Tree Regressor for data imputation are employed, followed by scaling using Robust Scaler, Min-Max Scaler, and Standard Scaler. A thorough comparison of these preprocessing methods revealed that PPCA is the most suitable choice for imputing missing values in air quality datasets, while the Robust Scaler provided the most reliable and accurate scaling. Additionally, Stochastic Gradient Descent (SGD) is integrated as an optimization technique to enhance model performance. The Weighted Average ensemble method, combining PPCA imputation and Robust Scaler, demonstrated superior predictive capabilities. This research highlights the potential for further improvements through additional ensemble techniques and model optimization strategies, opening avenues for future research focused on improving prediction precision and advancing the achievement of Sustainable Development Goals linked to environmental sustainability.
基于先进预处理和深度集成学习的空气污染预测
该研究调查了模型选择和预处理技术对空气污染预测性能的影响,特别是与实现可持续发展目标(sdg)有关的影响。预测模型的准确训练需要有效地处理环境数据集中的缺失值或空值。为了应对这一挑战,采用概率主成分分析(PPCA)和额外树回归器进行数据输入,然后使用鲁棒标量、最小-最大标量和标准标量进行缩放。通过对这些预处理方法的全面比较,发现PPCA是空气质量数据集中缺失值的最合适的选择,而稳健标度器提供了最可靠和准确的标度。此外,将随机梯度下降(SGD)作为一种优化技术来提高模型的性能。加权平均集成方法结合PPCA imputation和Robust Scaler,显示出较好的预测能力。本研究强调了通过额外的集成技术和模型优化策略进一步改进的潜力,为未来的研究开辟了道路,重点是提高预测精度,推进与环境可持续性相关的可持续发展目标的实现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Atmospheric Pollution Research
Atmospheric Pollution Research ENVIRONMENTAL SCIENCES-
CiteScore
8.30
自引率
6.70%
发文量
256
审稿时长
36 days
期刊介绍: Atmospheric Pollution Research (APR) is an international journal designed for the publication of articles on air pollution. Papers should present novel experimental results, theory and modeling of air pollution on local, regional, or global scales. Areas covered are research on inorganic, organic, and persistent organic air pollutants, air quality monitoring, air quality management, atmospheric dispersion and transport, air-surface (soil, water, and vegetation) exchange of pollutants, dry and wet deposition, indoor air quality, exposure assessment, health effects, satellite measurements, natural emissions, atmospheric chemistry, greenhouse gases, and effects on climate change.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信