Gaurav Narkhede , Mustafa Poonawala , Atharva Sonawane , Anil Hiwale , Arvind R. Singh
{"title":"Air pollution prediction with advanced preprocessing and deep ensemble learning","authors":"Gaurav Narkhede , Mustafa Poonawala , Atharva Sonawane , Anil Hiwale , Arvind R. Singh","doi":"10.1016/j.apr.2025.102610","DOIUrl":null,"url":null,"abstract":"<div><div>The research presented investigates the impact of model selection and preprocessing techniques on air pollution prediction performance, particularly pertaining to achieving Sustainable Development Goals (SDGs). Accurate training of predictive models necessitates effective handling of missing or null values in environmental datasets. To address this challenge, Probabilistic Principal Component Analysis (PPCA) and the Extra Tree Regressor for data imputation are employed, followed by scaling using Robust Scaler, Min-Max Scaler, and Standard Scaler. A thorough comparison of these preprocessing methods revealed that PPCA is the most suitable choice for imputing missing values in air quality datasets, while the Robust Scaler provided the most reliable and accurate scaling. Additionally, Stochastic Gradient Descent (SGD) is integrated as an optimization technique to enhance model performance. The Weighted Average ensemble method, combining PPCA imputation and Robust Scaler, demonstrated superior predictive capabilities. This research highlights the potential for further improvements through additional ensemble techniques and model optimization strategies, opening avenues for future research focused on improving prediction precision and advancing the achievement of Sustainable Development Goals linked to environmental sustainability.</div></div>","PeriodicalId":8604,"journal":{"name":"Atmospheric Pollution Research","volume":"16 10","pages":"Article 102610"},"PeriodicalIF":3.9000,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Pollution Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1309104225002120","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
The research presented investigates the impact of model selection and preprocessing techniques on air pollution prediction performance, particularly pertaining to achieving Sustainable Development Goals (SDGs). Accurate training of predictive models necessitates effective handling of missing or null values in environmental datasets. To address this challenge, Probabilistic Principal Component Analysis (PPCA) and the Extra Tree Regressor for data imputation are employed, followed by scaling using Robust Scaler, Min-Max Scaler, and Standard Scaler. A thorough comparison of these preprocessing methods revealed that PPCA is the most suitable choice for imputing missing values in air quality datasets, while the Robust Scaler provided the most reliable and accurate scaling. Additionally, Stochastic Gradient Descent (SGD) is integrated as an optimization technique to enhance model performance. The Weighted Average ensemble method, combining PPCA imputation and Robust Scaler, demonstrated superior predictive capabilities. This research highlights the potential for further improvements through additional ensemble techniques and model optimization strategies, opening avenues for future research focused on improving prediction precision and advancing the achievement of Sustainable Development Goals linked to environmental sustainability.
期刊介绍:
Atmospheric Pollution Research (APR) is an international journal designed for the publication of articles on air pollution. Papers should present novel experimental results, theory and modeling of air pollution on local, regional, or global scales. Areas covered are research on inorganic, organic, and persistent organic air pollutants, air quality monitoring, air quality management, atmospheric dispersion and transport, air-surface (soil, water, and vegetation) exchange of pollutants, dry and wet deposition, indoor air quality, exposure assessment, health effects, satellite measurements, natural emissions, atmospheric chemistry, greenhouse gases, and effects on climate change.