{"title":"Air Quality Forecasting Using Machine Learning: Comparative Analysis and Ensemble Strategies for Enhanced Prediction","authors":"Yıldırım Özüpak, Feyyaz Alpsalaz, Emrah Aslan","doi":"10.1007/s11270-025-08122-8","DOIUrl":null,"url":null,"abstract":"<div><p>Air pollution poses a critical challenge to environmental sustainability, public health, and urban planning. Accurate air quality prediction is essential for devising effective management strategies and early warning systems. This study utilized a dataset comprising hourly measurements of pollutants such as PM2.5, NO<sub>x</sub>, CO, and benzene, sourced from five metal oxide sensors and a certified analyzer in a polluted urban area, totaling 9,357 records collected over one year (March 2004–February 2005) from the Kaggle Air Quality Data Set. A comprehensive comparison of ten machine learning regression models XGBoost, LightGBM, Random Forest, Gradient Boosting, CatBoost, Support Vector Regression (SVR) with Bayesian Optimization, Decision Tree, K-Nearest Neighbors (KNN), Elastic Net, and Bayesian Ridge was conducted. Model performance was enhanced through Bayesian optimization and randomized cross-validation, with stacking employed to leverage the strengths of base models. Experimental results showed that hyperparameter optimization and ensemble strategies significantly improved accuracy, with the SVR model optimized via Bayesian optimization achieving the highest performance: an R<sup>2</sup> score of 99.94%, MAE of 0.0120, and MSE of 0.0005. These findings underscore the methodology’s efficacy in precisely capturing the spatial and temporal dynamics of air pollution.</p></div>","PeriodicalId":808,"journal":{"name":"Water, Air, & Soil Pollution","volume":"236 7","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11270-025-08122-8.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Water, Air, & Soil Pollution","FirstCategoryId":"6","ListUrlMain":"https://link.springer.com/article/10.1007/s11270-025-08122-8","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Air pollution poses a critical challenge to environmental sustainability, public health, and urban planning. Accurate air quality prediction is essential for devising effective management strategies and early warning systems. This study utilized a dataset comprising hourly measurements of pollutants such as PM2.5, NOx, CO, and benzene, sourced from five metal oxide sensors and a certified analyzer in a polluted urban area, totaling 9,357 records collected over one year (March 2004–February 2005) from the Kaggle Air Quality Data Set. A comprehensive comparison of ten machine learning regression models XGBoost, LightGBM, Random Forest, Gradient Boosting, CatBoost, Support Vector Regression (SVR) with Bayesian Optimization, Decision Tree, K-Nearest Neighbors (KNN), Elastic Net, and Bayesian Ridge was conducted. Model performance was enhanced through Bayesian optimization and randomized cross-validation, with stacking employed to leverage the strengths of base models. Experimental results showed that hyperparameter optimization and ensemble strategies significantly improved accuracy, with the SVR model optimized via Bayesian optimization achieving the highest performance: an R2 score of 99.94%, MAE of 0.0120, and MSE of 0.0005. These findings underscore the methodology’s efficacy in precisely capturing the spatial and temporal dynamics of air pollution.
期刊介绍:
Water, Air, & Soil Pollution is an international, interdisciplinary journal on all aspects of pollution and solutions to pollution in the biosphere. This includes chemical, physical and biological processes affecting flora, fauna, water, air and soil in relation to environmental pollution. Because of its scope, the subject areas are diverse and include all aspects of pollution sources, transport, deposition, accumulation, acid precipitation, atmospheric pollution, metals, aquatic pollution including marine pollution and ground water, waste water, pesticides, soil pollution, sewage, sediment pollution, forestry pollution, effects of pollutants on humans, vegetation, fish, aquatic species, micro-organisms, and animals, environmental and molecular toxicology applied to pollution research, biosensors, global and climate change, ecological implications of pollution and pollution models. Water, Air, & Soil Pollution also publishes manuscripts on novel methods used in the study of environmental pollutants, environmental toxicology, environmental biology, novel environmental engineering related to pollution, biodiversity as influenced by pollution, novel environmental biotechnology as applied to pollution (e.g. bioremediation), environmental modelling and biorestoration of polluted environments.
Articles should not be submitted that are of local interest only and do not advance international knowledge in environmental pollution and solutions to pollution. Articles that simply replicate known knowledge or techniques while researching a local pollution problem will normally be rejected without review. Submitted articles must have up-to-date references, employ the correct experimental replication and statistical analysis, where needed and contain a significant contribution to new knowledge. The publishing and editorial team sincerely appreciate your cooperation.
Water, Air, & Soil Pollution publishes research papers; review articles; mini-reviews; and book reviews.