{"title":"Improved k-NN Regression Model Using Random Forests for Air Pollution Prediction","authors":"Siddhartha Sharma, R. Lakshmi","doi":"10.1109/SmartNets58706.2023.10216028","DOIUrl":null,"url":null,"abstract":"In this paper, we review various k-Nearest-Neighbor (k-NN) based models and their accuracies to develop a better model to predict concentrations of air pollutants. The proposed model splits the range of target variable values into a number of buckets first. Then, a hybrid k-NN model, which is a combination of weighted attribute k-NN and distance-weighted k-NN, and where the weights are assigned by calculating Information Gain, is used for each attribute, to calculate the target variable value of each test case. The proposed model decreases the root mean square error (RMSE) of predicted NO, NO2 and NOx values by 28.29%, 29.44%, and 16.51% respectively, compared to the state-of the-art. Similarly, the mean absolute error (MAE) values for NO, NO2, and NOx are decreased by 18.26%, 33.67%, and 14.54%, compared to the state-of the-art. This model gives good results when the size of each bucket is nearly equal.","PeriodicalId":301834,"journal":{"name":"2023 International Conference on Smart Applications, Communications and Networking (SmartNets)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Smart Applications, Communications and Networking (SmartNets)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SmartNets58706.2023.10216028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we review various k-Nearest-Neighbor (k-NN) based models and their accuracies to develop a better model to predict concentrations of air pollutants. The proposed model splits the range of target variable values into a number of buckets first. Then, a hybrid k-NN model, which is a combination of weighted attribute k-NN and distance-weighted k-NN, and where the weights are assigned by calculating Information Gain, is used for each attribute, to calculate the target variable value of each test case. The proposed model decreases the root mean square error (RMSE) of predicted NO, NO2 and NOx values by 28.29%, 29.44%, and 16.51% respectively, compared to the state-of the-art. Similarly, the mean absolute error (MAE) values for NO, NO2, and NOx are decreased by 18.26%, 33.67%, and 14.54%, compared to the state-of the-art. This model gives good results when the size of each bucket is nearly equal.