Identification of best machine learning model for the real-time vehicular data based prediction of PM2.5 and PM10

IF 3.9 3区 环境科学与生态学 Q2 ENVIRONMENTAL SCIENCES
Rohit Kumar , Ramagopal V.S. Uppaluri
{"title":"Identification of best machine learning model for the real-time vehicular data based prediction of PM2.5 and PM10","authors":"Rohit Kumar ,&nbsp;Ramagopal V.S. Uppaluri","doi":"10.1016/j.apr.2025.102575","DOIUrl":null,"url":null,"abstract":"<div><div>In fast-developing urban regions such as the Guwahati City, the particulate matter (PM<sub>10</sub> and PM<sub>2.5</sub>) concentration prediction is vital to ascertain air quality and public health. Utilizing a large dataset that constitutes historical real-time pollution data, vehicular population count (petrol and diesel), and meteorological characteristics (temperature, wind direction, solar radiation, relative humidity, wind speed) data, the article applies alternate machine-learning algorithms for the prediction of PM<sub>2.5</sub> and PM<sub>10</sub> levels in the Guwahati city. The intricate temporal patterns and seasonality inclines of the air pollution data were captured with the alternate ML models namely Extreme Gradient Boosting, Decision Tree, Random Forest, Support Vector Regression, K nearest neighbour and Multilayer Perceptron. The models were assessed for their efficacy with important metrics such as the coefficient of determination, root mean square error and mean absolute error. The algorithmic performance based data analysis was undertaken to analyze upon the sensitive influence of lag features, rolling statistics, seasonal decomposition components, temporal features and seasonality-specific issues on the model performance. Accordingly, they highlight the efficacy of machine learning models for their ability and effectiveness to predict air quality parameters. The explorations convey that ensemble techniques such as the Extreme Gradient Boosting outperform other models in terms of the lowest RMSE values of 0.024 μg/m<sup>3</sup> and 0.041 μg/m<sup>3</sup> for PM<sub>2.5</sub> and PM<sub>10</sub> respectively; MAE values of 0.017 and 0.027 for PM<sub>2.5</sub> and PM<sub>10</sub> respectively and coefficient of determination values of 0.96 for PM<sub>2.5</sub> and values of 0.92 for PM<sub>10</sub>. Accordingly, the conducted investigations can foster the implementation of pragmatic policies that are to be meticulously followed to safeguard the air quality of the city.</div></div>","PeriodicalId":8604,"journal":{"name":"Atmospheric Pollution Research","volume":"16 9","pages":"Article 102575"},"PeriodicalIF":3.9000,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Pollution Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1309104225001771","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

In fast-developing urban regions such as the Guwahati City, the particulate matter (PM10 and PM2.5) concentration prediction is vital to ascertain air quality and public health. Utilizing a large dataset that constitutes historical real-time pollution data, vehicular population count (petrol and diesel), and meteorological characteristics (temperature, wind direction, solar radiation, relative humidity, wind speed) data, the article applies alternate machine-learning algorithms for the prediction of PM2.5 and PM10 levels in the Guwahati city. The intricate temporal patterns and seasonality inclines of the air pollution data were captured with the alternate ML models namely Extreme Gradient Boosting, Decision Tree, Random Forest, Support Vector Regression, K nearest neighbour and Multilayer Perceptron. The models were assessed for their efficacy with important metrics such as the coefficient of determination, root mean square error and mean absolute error. The algorithmic performance based data analysis was undertaken to analyze upon the sensitive influence of lag features, rolling statistics, seasonal decomposition components, temporal features and seasonality-specific issues on the model performance. Accordingly, they highlight the efficacy of machine learning models for their ability and effectiveness to predict air quality parameters. The explorations convey that ensemble techniques such as the Extreme Gradient Boosting outperform other models in terms of the lowest RMSE values of 0.024 μg/m3 and 0.041 μg/m3 for PM2.5 and PM10 respectively; MAE values of 0.017 and 0.027 for PM2.5 and PM10 respectively and coefficient of determination values of 0.96 for PM2.5 and values of 0.92 for PM10. Accordingly, the conducted investigations can foster the implementation of pragmatic policies that are to be meticulously followed to safeguard the air quality of the city.
基于实时车辆数据预测PM2.5和PM10的最佳机器学习模型识别
在古瓦哈提市等快速发展的城市地区,颗粒物质(PM10和PM2.5)浓度预测对于确定空气质量和公共卫生至关重要。利用构成历史实时污染数据、车辆数量(汽油和柴油)和气象特征(温度、风向、太阳辐射、相对湿度、风速)数据的大型数据集,本文应用替代机器学习算法来预测古瓦哈蒂市的PM2.5和PM10水平。空气污染数据的复杂时间模式和季节性趋势是通过交替的ML模型捕获的,即极端梯度增强、决策树、随机森林、支持向量回归、K近邻和多层感知器。用决定系数、均方根误差和平均绝对误差等重要指标评价模型的疗效。进行基于算法性能的数据分析,分析滞后特征、滚动统计、季节分解成分、时间特征和季节性特定问题对模型性能的敏感影响。因此,他们强调了机器学习模型在预测空气质量参数方面的能力和有效性。结果表明,在PM2.5和PM10的最小RMSE值分别为0.024 μg/m3和0.041 μg/m3方面,极端梯度增强等集合技术优于其他模型;PM2.5和PM10的MAE分别为0.017和0.027,PM2.5和PM10的决定系数分别为0.96和0.92。因此,所进行的调查可以促进务实政策的实施,这些政策将被精心遵循,以保障城市的空气质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Atmospheric Pollution Research
Atmospheric Pollution Research ENVIRONMENTAL SCIENCES-
CiteScore
8.30
自引率
6.70%
发文量
256
审稿时长
36 days
期刊介绍: Atmospheric Pollution Research (APR) is an international journal designed for the publication of articles on air pollution. Papers should present novel experimental results, theory and modeling of air pollution on local, regional, or global scales. Areas covered are research on inorganic, organic, and persistent organic air pollutants, air quality monitoring, air quality management, atmospheric dispersion and transport, air-surface (soil, water, and vegetation) exchange of pollutants, dry and wet deposition, indoor air quality, exposure assessment, health effects, satellite measurements, natural emissions, atmospheric chemistry, greenhouse gases, and effects on climate change.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信