{"title":"K-Means Clustering driven Deep Spatiotemporal Learning Model for PM2.5 Prediction","authors":"Kunal G. Srinivas, N. Giri","doi":"10.1109/icdcece53908.2022.9793281","DOIUrl":null,"url":null,"abstract":"This paper proposed a novel and robust clustering driven deep spatiotemporal learning model for PM2.5 concentration prediction. Unlike classical approaches of PM2.5 prediction, our proposed model emphasizes on both feature improvement as well as feature learning to achieve a generalizable BigData analytics solution for PM2.5 prediction. More specifically, in this paper four Chinese city’s data (Chengdu, Guangzhou, Shenyang, and Shanghai) have been considered where each city possesses three monitoring stations providing spatiotemporal features like timestamp, wind-direction, wind-speed, temperature, dew, humidity, precipitation and corresponding PM2.5 concentration. To alleviate missing element problem, at first it performs data wrangling and missing element removal, which is then followed by clustering using K-Means algorithm. Unlike classical methods, where input spatiotemporal features are directly learnt, we clustered the non-zero instances or features for the different time-periods so as to make learning more efficient. Once clustering the dataset, we applied three different deep spatiotemporal learning models derived using deep Long- and Short-Term Memory (LSTM) architecture to perform PM2.5 prediction. The performance in terms of prediction results and allied mean square error exhibit that the proposed model performs superior over other existing techniques, including classical LSTM methods. Results confirm that the use of clustered features can yield more accurate performance than the random feature learning. The overall proposed model was implemented over Apache Spark platform, which makes it suitable for the decentralized computation or BigData analytics purposes.","PeriodicalId":417643,"journal":{"name":"2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icdcece53908.2022.9793281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper proposed a novel and robust clustering driven deep spatiotemporal learning model for PM2.5 concentration prediction. Unlike classical approaches of PM2.5 prediction, our proposed model emphasizes on both feature improvement as well as feature learning to achieve a generalizable BigData analytics solution for PM2.5 prediction. More specifically, in this paper four Chinese city’s data (Chengdu, Guangzhou, Shenyang, and Shanghai) have been considered where each city possesses three monitoring stations providing spatiotemporal features like timestamp, wind-direction, wind-speed, temperature, dew, humidity, precipitation and corresponding PM2.5 concentration. To alleviate missing element problem, at first it performs data wrangling and missing element removal, which is then followed by clustering using K-Means algorithm. Unlike classical methods, where input spatiotemporal features are directly learnt, we clustered the non-zero instances or features for the different time-periods so as to make learning more efficient. Once clustering the dataset, we applied three different deep spatiotemporal learning models derived using deep Long- and Short-Term Memory (LSTM) architecture to perform PM2.5 prediction. The performance in terms of prediction results and allied mean square error exhibit that the proposed model performs superior over other existing techniques, including classical LSTM methods. Results confirm that the use of clustered features can yield more accurate performance than the random feature learning. The overall proposed model was implemented over Apache Spark platform, which makes it suitable for the decentralized computation or BigData analytics purposes.