{"title":"Identification of best machine learning model for the real-time vehicular data based prediction of PM2.5 and PM10","authors":"Rohit Kumar , Ramagopal V.S. Uppaluri","doi":"10.1016/j.apr.2025.102575","DOIUrl":null,"url":null,"abstract":"<div><div>In fast-developing urban regions such as the Guwahati City, the particulate matter (PM<sub>10</sub> and PM<sub>2.5</sub>) concentration prediction is vital to ascertain air quality and public health. Utilizing a large dataset that constitutes historical real-time pollution data, vehicular population count (petrol and diesel), and meteorological characteristics (temperature, wind direction, solar radiation, relative humidity, wind speed) data, the article applies alternate machine-learning algorithms for the prediction of PM<sub>2.5</sub> and PM<sub>10</sub> levels in the Guwahati city. The intricate temporal patterns and seasonality inclines of the air pollution data were captured with the alternate ML models namely Extreme Gradient Boosting, Decision Tree, Random Forest, Support Vector Regression, K nearest neighbour and Multilayer Perceptron. The models were assessed for their efficacy with important metrics such as the coefficient of determination, root mean square error and mean absolute error. The algorithmic performance based data analysis was undertaken to analyze upon the sensitive influence of lag features, rolling statistics, seasonal decomposition components, temporal features and seasonality-specific issues on the model performance. Accordingly, they highlight the efficacy of machine learning models for their ability and effectiveness to predict air quality parameters. The explorations convey that ensemble techniques such as the Extreme Gradient Boosting outperform other models in terms of the lowest RMSE values of 0.024 μg/m<sup>3</sup> and 0.041 μg/m<sup>3</sup> for PM<sub>2.5</sub> and PM<sub>10</sub> respectively; MAE values of 0.017 and 0.027 for PM<sub>2.5</sub> and PM<sub>10</sub> respectively and coefficient of determination values of 0.96 for PM<sub>2.5</sub> and values of 0.92 for PM<sub>10</sub>. Accordingly, the conducted investigations can foster the implementation of pragmatic policies that are to be meticulously followed to safeguard the air quality of the city.</div></div>","PeriodicalId":8604,"journal":{"name":"Atmospheric Pollution Research","volume":"16 9","pages":"Article 102575"},"PeriodicalIF":3.9000,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Pollution Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1309104225001771","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
In fast-developing urban regions such as the Guwahati City, the particulate matter (PM10 and PM2.5) concentration prediction is vital to ascertain air quality and public health. Utilizing a large dataset that constitutes historical real-time pollution data, vehicular population count (petrol and diesel), and meteorological characteristics (temperature, wind direction, solar radiation, relative humidity, wind speed) data, the article applies alternate machine-learning algorithms for the prediction of PM2.5 and PM10 levels in the Guwahati city. The intricate temporal patterns and seasonality inclines of the air pollution data were captured with the alternate ML models namely Extreme Gradient Boosting, Decision Tree, Random Forest, Support Vector Regression, K nearest neighbour and Multilayer Perceptron. The models were assessed for their efficacy with important metrics such as the coefficient of determination, root mean square error and mean absolute error. The algorithmic performance based data analysis was undertaken to analyze upon the sensitive influence of lag features, rolling statistics, seasonal decomposition components, temporal features and seasonality-specific issues on the model performance. Accordingly, they highlight the efficacy of machine learning models for their ability and effectiveness to predict air quality parameters. The explorations convey that ensemble techniques such as the Extreme Gradient Boosting outperform other models in terms of the lowest RMSE values of 0.024 μg/m3 and 0.041 μg/m3 for PM2.5 and PM10 respectively; MAE values of 0.017 and 0.027 for PM2.5 and PM10 respectively and coefficient of determination values of 0.96 for PM2.5 and values of 0.92 for PM10. Accordingly, the conducted investigations can foster the implementation of pragmatic policies that are to be meticulously followed to safeguard the air quality of the city.
期刊介绍:
Atmospheric Pollution Research (APR) is an international journal designed for the publication of articles on air pollution. Papers should present novel experimental results, theory and modeling of air pollution on local, regional, or global scales. Areas covered are research on inorganic, organic, and persistent organic air pollutants, air quality monitoring, air quality management, atmospheric dispersion and transport, air-surface (soil, water, and vegetation) exchange of pollutants, dry and wet deposition, indoor air quality, exposure assessment, health effects, satellite measurements, natural emissions, atmospheric chemistry, greenhouse gases, and effects on climate change.