Identification of best machine learning model for the real-time vehicular data based prediction of PM2.5 and PM10

IF 3.5 3区环境科学与生态学 Q2 ENVIRONMENTAL SCIENCES

Atmospheric Pollution Research Pub Date : 2025-05-10 DOI:10.1016/j.apr.2025.102575

Rohit Kumar , Ramagopal V.S. Uppaluri

{"title":"Identification of best machine learning model for the real-time vehicular data based prediction of PM2.5 and PM10","authors":"Rohit Kumar , Ramagopal V.S. Uppaluri","doi":"10.1016/j.apr.2025.102575","DOIUrl":null,"url":null,"abstract":"<div><div>In fast-developing urban regions such as the Guwahati City, the particulate matter (PM10 and PM2.5) concentration prediction is vital to ascertain air quality and public health. Utilizing a large dataset that constitutes historical real-time pollution data, vehicular population count (petrol and diesel), and meteorological characteristics (temperature, wind direction, solar radiation, relative humidity, wind speed) data, the article applies alternate machine-learning algorithms for the prediction of PM2.5 and PM10 levels in the Guwahati city. The intricate temporal patterns and seasonality inclines of the air pollution data were captured with the alternate ML models namely Extreme Gradient Boosting, Decision Tree, Random Forest, Support Vector Regression, K nearest neighbour and Multilayer Perceptron. The models were assessed for their efficacy with important metrics such as the coefficient of determination, root mean square error and mean absolute error. The algorithmic performance based data analysis was undertaken to analyze upon the sensitive influence of lag features, rolling statistics, seasonal decomposition components, temporal features and seasonality-specific issues on the model performance. Accordingly, they highlight the efficacy of machine learning models for their ability and effectiveness to predict air quality parameters. The explorations convey that ensemble techniques such as the Extreme Gradient Boosting outperform other models in terms of the lowest RMSE values of 0.024 μg/m3 and 0.041 μg/m3 for PM2.5 and PM10 respectively; MAE values of 0.017 and 0.027 for PM2.5 and PM10 respectively and coefficient of determination values of 0.96 for PM2.5 and values of 0.92 for PM10. Accordingly, the conducted investigations can foster the implementation of pragmatic policies that are to be meticulously followed to safeguard the air quality of the city.</div></div>","PeriodicalId":8604,"journal":{"name":"Atmospheric Pollution Research","volume":"16 9","pages":"Article 102575"},"PeriodicalIF":3.5000,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Pollution Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1309104225001771","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

In fast-developing urban regions such as the Guwahati City, the particulate matter (PM₁₀ and PM_2.5) concentration prediction is vital to ascertain air quality and public health. Utilizing a large dataset that constitutes historical real-time pollution data, vehicular population count (petrol and diesel), and meteorological characteristics (temperature, wind direction, solar radiation, relative humidity, wind speed) data, the article applies alternate machine-learning algorithms for the prediction of PM_2.5 and PM₁₀ levels in the Guwahati city. The intricate temporal patterns and seasonality inclines of the air pollution data were captured with the alternate ML models namely Extreme Gradient Boosting, Decision Tree, Random Forest, Support Vector Regression, K nearest neighbour and Multilayer Perceptron. The models were assessed for their efficacy with important metrics such as the coefficient of determination, root mean square error and mean absolute error. The algorithmic performance based data analysis was undertaken to analyze upon the sensitive influence of lag features, rolling statistics, seasonal decomposition components, temporal features and seasonality-specific issues on the model performance. Accordingly, they highlight the efficacy of machine learning models for their ability and effectiveness to predict air quality parameters. The explorations convey that ensemble techniques such as the Extreme Gradient Boosting outperform other models in terms of the lowest RMSE values of 0.024 μg/m³ and 0.041 μg/m³ for PM_2.5 and PM₁₀ respectively; MAE values of 0.017 and 0.027 for PM_2.5 and PM₁₀ respectively and coefficient of determination values of 0.96 for PM_2.5 and values of 0.92 for PM₁₀. Accordingly, the conducted investigations can foster the implementation of pragmatic policies that are to be meticulously followed to safeguard the air quality of the city.

Abstract Image

查看原文本刊更多论文

基于实时车辆数据预测PM2.5和PM10的最佳机器学习模型识别

在古瓦哈提市等快速发展的城市地区，颗粒物质（PM10和PM2.5）浓度预测对于确定空气质量和公共卫生至关重要。利用构成历史实时污染数据、车辆数量（汽油和柴油）和气象特征（温度、风向、太阳辐射、相对湿度、风速）数据的大型数据集，本文应用替代机器学习算法来预测古瓦哈蒂市的PM2.5和PM10水平。空气污染数据的复杂时间模式和季节性趋势是通过交替的ML模型捕获的，即极端梯度增强、决策树、随机森林、支持向量回归、K近邻和多层感知器。用决定系数、均方根误差和平均绝对误差等重要指标评价模型的疗效。进行基于算法性能的数据分析，分析滞后特征、滚动统计、季节分解成分、时间特征和季节性特定问题对模型性能的敏感影响。因此，他们强调了机器学习模型在预测空气质量参数方面的能力和有效性。结果表明，在PM2.5和PM10的最小RMSE值分别为0.024 μg/m3和0.041 μg/m3方面，极端梯度增强等集合技术优于其他模型；PM2.5和PM10的MAE分别为0.017和0.027,PM2.5和PM10的决定系数分别为0.96和0.92。因此，所进行的调查可以促进务实政策的实施，这些政策将被精心遵循，以保障城市的空气质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Atmospheric Pollution Research ENVIRONMENTAL SCIENCES-

CiteScore

8.30

自引率

6.70%

发文量

256

审稿时长

36 days

期刊介绍： Atmospheric Pollution Research (APR) is an international journal designed for the publication of articles on air pollution. Papers should present novel experimental results, theory and modeling of air pollution on local, regional, or global scales. Areas covered are research on inorganic, organic, and persistent organic air pollutants, air quality monitoring, air quality management, atmospheric dispersion and transport, air-surface (soil, water, and vegetation) exchange of pollutants, dry and wet deposition, indoor air quality, exposure assessment, health effects, satellite measurements, natural emissions, atmospheric chemistry, greenhouse gases, and effects on climate change.