使用机器学习的空气质量预测:增强预测的比较分析和集成策略

IF 3.8 4区 环境科学与生态学 Q2 ENVIRONMENTAL SCIENCES
Yıldırım Özüpak, Feyyaz Alpsalaz, Emrah Aslan
{"title":"使用机器学习的空气质量预测:增强预测的比较分析和集成策略","authors":"Yıldırım Özüpak,&nbsp;Feyyaz Alpsalaz,&nbsp;Emrah Aslan","doi":"10.1007/s11270-025-08122-8","DOIUrl":null,"url":null,"abstract":"<div><p>Air pollution poses a critical challenge to environmental sustainability, public health, and urban planning. Accurate air quality prediction is essential for devising effective management strategies and early warning systems. This study utilized a dataset comprising hourly measurements of pollutants such as PM2.5, NO<sub>x</sub>, CO, and benzene, sourced from five metal oxide sensors and a certified analyzer in a polluted urban area, totaling 9,357 records collected over one year (March 2004–February 2005) from the Kaggle Air Quality Data Set. A comprehensive comparison of ten machine learning regression models XGBoost, LightGBM, Random Forest, Gradient Boosting, CatBoost, Support Vector Regression (SVR) with Bayesian Optimization, Decision Tree, K-Nearest Neighbors (KNN), Elastic Net, and Bayesian Ridge was conducted. Model performance was enhanced through Bayesian optimization and randomized cross-validation, with stacking employed to leverage the strengths of base models. Experimental results showed that hyperparameter optimization and ensemble strategies significantly improved accuracy, with the SVR model optimized via Bayesian optimization achieving the highest performance: an R<sup>2</sup> score of 99.94%, MAE of 0.0120, and MSE of 0.0005. These findings underscore the methodology’s efficacy in precisely capturing the spatial and temporal dynamics of air pollution.</p></div>","PeriodicalId":808,"journal":{"name":"Water, Air, & Soil Pollution","volume":"236 7","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11270-025-08122-8.pdf","citationCount":"0","resultStr":"{\"title\":\"Air Quality Forecasting Using Machine Learning: Comparative Analysis and Ensemble Strategies for Enhanced Prediction\",\"authors\":\"Yıldırım Özüpak,&nbsp;Feyyaz Alpsalaz,&nbsp;Emrah Aslan\",\"doi\":\"10.1007/s11270-025-08122-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Air pollution poses a critical challenge to environmental sustainability, public health, and urban planning. Accurate air quality prediction is essential for devising effective management strategies and early warning systems. This study utilized a dataset comprising hourly measurements of pollutants such as PM2.5, NO<sub>x</sub>, CO, and benzene, sourced from five metal oxide sensors and a certified analyzer in a polluted urban area, totaling 9,357 records collected over one year (March 2004–February 2005) from the Kaggle Air Quality Data Set. A comprehensive comparison of ten machine learning regression models XGBoost, LightGBM, Random Forest, Gradient Boosting, CatBoost, Support Vector Regression (SVR) with Bayesian Optimization, Decision Tree, K-Nearest Neighbors (KNN), Elastic Net, and Bayesian Ridge was conducted. Model performance was enhanced through Bayesian optimization and randomized cross-validation, with stacking employed to leverage the strengths of base models. Experimental results showed that hyperparameter optimization and ensemble strategies significantly improved accuracy, with the SVR model optimized via Bayesian optimization achieving the highest performance: an R<sup>2</sup> score of 99.94%, MAE of 0.0120, and MSE of 0.0005. These findings underscore the methodology’s efficacy in precisely capturing the spatial and temporal dynamics of air pollution.</p></div>\",\"PeriodicalId\":808,\"journal\":{\"name\":\"Water, Air, & Soil Pollution\",\"volume\":\"236 7\",\"pages\":\"\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s11270-025-08122-8.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Water, Air, & Soil Pollution\",\"FirstCategoryId\":\"6\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s11270-025-08122-8\",\"RegionNum\":4,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Water, Air, & Soil Pollution","FirstCategoryId":"6","ListUrlMain":"https://link.springer.com/article/10.1007/s11270-025-08122-8","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

空气污染对环境可持续性、公共卫生和城市规划构成了重大挑战。准确的空气质量预测对于制定有效的管理战略和早期预警系统至关重要。本研究利用了一个数据集,包括PM2.5、氮氧化物、一氧化碳和苯等污染物的每小时测量数据,这些数据来自于一个受污染城市地区的五个金属氧化物传感器和一台经过认证的分析仪,从Kaggle空气质量数据集中收集了一年内(2004年3月至2005年2月)的9357条记录。对十种机器学习回归模型XGBoost、LightGBM、Random Forest、Gradient Boosting、CatBoost、支持向量回归(SVR)与贝叶斯优化、决策树、k近邻(KNN)、Elastic Net和贝叶斯岭进行了全面比较。通过贝叶斯优化和随机交叉验证来增强模型性能,并利用堆叠来利用基础模型的优势。实验结果表明,超参数优化和集成策略显著提高了准确率,其中通过贝叶斯优化优化的SVR模型达到了最高的性能,R2得分为99.94%,MAE为0.0120,MSE为0.0005。这些发现强调了该方法在精确捕捉空气污染的时空动态方面的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Air Quality Forecasting Using Machine Learning: Comparative Analysis and Ensemble Strategies for Enhanced Prediction

Air pollution poses a critical challenge to environmental sustainability, public health, and urban planning. Accurate air quality prediction is essential for devising effective management strategies and early warning systems. This study utilized a dataset comprising hourly measurements of pollutants such as PM2.5, NOx, CO, and benzene, sourced from five metal oxide sensors and a certified analyzer in a polluted urban area, totaling 9,357 records collected over one year (March 2004–February 2005) from the Kaggle Air Quality Data Set. A comprehensive comparison of ten machine learning regression models XGBoost, LightGBM, Random Forest, Gradient Boosting, CatBoost, Support Vector Regression (SVR) with Bayesian Optimization, Decision Tree, K-Nearest Neighbors (KNN), Elastic Net, and Bayesian Ridge was conducted. Model performance was enhanced through Bayesian optimization and randomized cross-validation, with stacking employed to leverage the strengths of base models. Experimental results showed that hyperparameter optimization and ensemble strategies significantly improved accuracy, with the SVR model optimized via Bayesian optimization achieving the highest performance: an R2 score of 99.94%, MAE of 0.0120, and MSE of 0.0005. These findings underscore the methodology’s efficacy in precisely capturing the spatial and temporal dynamics of air pollution.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Water, Air, & Soil Pollution
Water, Air, & Soil Pollution 环境科学-环境科学
CiteScore
4.50
自引率
6.90%
发文量
448
审稿时长
2.6 months
期刊介绍: Water, Air, & Soil Pollution is an international, interdisciplinary journal on all aspects of pollution and solutions to pollution in the biosphere. This includes chemical, physical and biological processes affecting flora, fauna, water, air and soil in relation to environmental pollution. Because of its scope, the subject areas are diverse and include all aspects of pollution sources, transport, deposition, accumulation, acid precipitation, atmospheric pollution, metals, aquatic pollution including marine pollution and ground water, waste water, pesticides, soil pollution, sewage, sediment pollution, forestry pollution, effects of pollutants on humans, vegetation, fish, aquatic species, micro-organisms, and animals, environmental and molecular toxicology applied to pollution research, biosensors, global and climate change, ecological implications of pollution and pollution models. Water, Air, & Soil Pollution also publishes manuscripts on novel methods used in the study of environmental pollutants, environmental toxicology, environmental biology, novel environmental engineering related to pollution, biodiversity as influenced by pollution, novel environmental biotechnology as applied to pollution (e.g. bioremediation), environmental modelling and biorestoration of polluted environments. Articles should not be submitted that are of local interest only and do not advance international knowledge in environmental pollution and solutions to pollution. Articles that simply replicate known knowledge or techniques while researching a local pollution problem will normally be rejected without review. Submitted articles must have up-to-date references, employ the correct experimental replication and statistical analysis, where needed and contain a significant contribution to new knowledge. The publishing and editorial team sincerely appreciate your cooperation. Water, Air, & Soil Pollution publishes research papers; review articles; mini-reviews; and book reviews.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信