Garbage prediction using regression analysis for municipal corporations of Indian cities

IF 1.3 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation and Systems Pub Date : 2024-10-19 DOI:10.1049/ccs2.12103

Raj Kumar Sharma, Manisha Jailia

{"title":"Garbage prediction using regression analysis for municipal corporations of Indian cities","authors":"Raj Kumar Sharma, Manisha Jailia","doi":"10.1049/ccs2.12103","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <p>Garbage management is exceptionally critical and poses enormous environmental challenges. It has always been a vital issue in municipal corporations. However, municipal agencies have developed and used garbage management systems. Garbage forecasting still plays a crucial role in the management system and helps improve or create a garbage management system. This research examines the information from 212 cities to suggest a helpful regression model for garbage forecasting and control. To establish a connection between the variables, the descriptive study employs statistical techniques to learn about the composition of data collected from municipal corporations and conduct correlation analysis. Population and garbage depend highly on one another, as evidenced by their correlation coefficient of 0.922,144. The primary research is used to build an alternate hypothesis that shows the chosen variables are highly dependent on one another. The dataset is scaled and divided into a training and testing 80:20 ratio during the pre-processing data phase. This research aims to do a regression analysis with daily garbage production, urban area, and population as independent variables. This research initiates a variety of regression models, including multiple linear regression (MLR), artificial neural network (ANN), decision tree regression (DTR), and random forest regression (RFR). The MLR model's R2 value of 0.85 indicates that it has the potential to accurately forecast daily garbage production based on just two independent variables and a single dependent variable. Random Forest Regression (RFR) with (MSE: 100,078.749 & MAE: 182.212) shows that it has the lowest MSE among all the models, which provides the most accurate predictions on average and the fit values of 8.85 and 316.23 obtained from the error distribution with a bin value 25. The estimated results from each model are compared to the test data values on line graphs and Taylor plots. The mean square error and the mean absolute error in the analysis and the Taylor plot show that the RFR model is best suited for predicting daily garbage production in a city. This research, therefore, provides a Random Forest model that is optimal for such challenges and is recommended for this class of problem.</p>\n </section>\n </div>","PeriodicalId":33652,"journal":{"name":"Cognitive Computation and Systems","volume":"6 4","pages":"74-85"},"PeriodicalIF":1.3000,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ccs2.12103","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Computation and Systems","FirstCategoryId":"1085","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Garbage management is exceptionally critical and poses enormous environmental challenges. It has always been a vital issue in municipal corporations. However, municipal agencies have developed and used garbage management systems. Garbage forecasting still plays a crucial role in the management system and helps improve or create a garbage management system. This research examines the information from 212 cities to suggest a helpful regression model for garbage forecasting and control. To establish a connection between the variables, the descriptive study employs statistical techniques to learn about the composition of data collected from municipal corporations and conduct correlation analysis. Population and garbage depend highly on one another, as evidenced by their correlation coefficient of 0.922,144. The primary research is used to build an alternate hypothesis that shows the chosen variables are highly dependent on one another. The dataset is scaled and divided into a training and testing 80:20 ratio during the pre-processing data phase. This research aims to do a regression analysis with daily garbage production, urban area, and population as independent variables. This research initiates a variety of regression models, including multiple linear regression (MLR), artificial neural network (ANN), decision tree regression (DTR), and random forest regression (RFR). The MLR model's R2 value of 0.85 indicates that it has the potential to accurately forecast daily garbage production based on just two independent variables and a single dependent variable. Random Forest Regression (RFR) with (MSE: 100,078.749 & MAE: 182.212) shows that it has the lowest MSE among all the models, which provides the most accurate predictions on average and the fit values of 8.85 and 316.23 obtained from the error distribution with a bin value 25. The estimated results from each model are compared to the test data values on line graphs and Taylor plots. The mean square error and the mean absolute error in the analysis and the Taylor plot show that the RFR model is best suited for predicting daily garbage production in a city. This research, therefore, provides a Random Forest model that is optimal for such challenges and is recommended for this class of problem.

Abstract Image

查看原文本刊更多论文

基于回归分析的印度城市市政公司垃圾预测

垃圾管理是非常关键的，并提出了巨大的环境挑战。这一直是市政公司的一个重要问题。然而，市政机构已经开发并使用了垃圾管理系统。垃圾预测在管理系统中仍然起着至关重要的作用，有助于改进或创建一个垃圾管理系统。本文通过对212个城市的数据分析，提出了一种有助于垃圾预测和控制的回归模型。为了建立变量之间的联系，描述性研究采用统计技术了解从市政公司收集的数据的组成，并进行相关分析。人口与垃圾高度依赖，相关系数为0.922144。主要的研究是用来建立一个替代假设，表明所选择的变量是高度依赖于另一个。在预处理数据阶段，对数据集进行缩放并划分为训练和测试的80:20比例。本研究以生活垃圾产生量、城市面积、人口为自变量进行回归分析。本研究提出了多种回归模型，包括多元线性回归（MLR）、人工神经网络（ANN）、决策树回归（DTR）和随机森林回归（RFR）。MLR模型的R2值为0.85，表明该模型仅基于两个自变量和一个因变量，就具有准确预测日垃圾产生量的潜力。随机森林回归（RFR）， (MSE: 100,078.749 &；MAE: 182.212)表明，它在所有模型中MSE最低，平均预测最准确，从bin值为25的误差分布中得到的拟合值为8.85和316.23。将每个模型的估计结果与线形图和泰勒图上的测试数据值进行比较。分析的均方误差和平均绝对误差以及泰勒图表明，RFR模型最适合预测城市的日常垃圾产生量。因此，这项研究提供了一个随机森林模型，它是这类挑战的最佳选择，并被推荐用于这类问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊