K-Means Clustering driven Deep Spatiotemporal Learning Model for PM2.5 Prediction

2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE) Pub Date : 2022-04-23 DOI:10.1109/icdcece53908.2022.9793281

Kunal G. Srinivas, N. Giri

{"title":"K-Means Clustering driven Deep Spatiotemporal Learning Model for PM2.5 Prediction","authors":"Kunal G. Srinivas, N. Giri","doi":"10.1109/icdcece53908.2022.9793281","DOIUrl":null,"url":null,"abstract":"This paper proposed a novel and robust clustering driven deep spatiotemporal learning model for PM2.5 concentration prediction. Unlike classical approaches of PM2.5 prediction, our proposed model emphasizes on both feature improvement as well as feature learning to achieve a generalizable BigData analytics solution for PM2.5 prediction. More specifically, in this paper four Chinese city’s data (Chengdu, Guangzhou, Shenyang, and Shanghai) have been considered where each city possesses three monitoring stations providing spatiotemporal features like timestamp, wind-direction, wind-speed, temperature, dew, humidity, precipitation and corresponding PM2.5 concentration. To alleviate missing element problem, at first it performs data wrangling and missing element removal, which is then followed by clustering using K-Means algorithm. Unlike classical methods, where input spatiotemporal features are directly learnt, we clustered the non-zero instances or features for the different time-periods so as to make learning more efficient. Once clustering the dataset, we applied three different deep spatiotemporal learning models derived using deep Long- and Short-Term Memory (LSTM) architecture to perform PM2.5 prediction. The performance in terms of prediction results and allied mean square error exhibit that the proposed model performs superior over other existing techniques, including classical LSTM methods. Results confirm that the use of clustered features can yield more accurate performance than the random feature learning. The overall proposed model was implemented over Apache Spark platform, which makes it suitable for the decentralized computation or BigData analytics purposes.","PeriodicalId":417643,"journal":{"name":"2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icdcece53908.2022.9793281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper proposed a novel and robust clustering driven deep spatiotemporal learning model for PM2.5 concentration prediction. Unlike classical approaches of PM2.5 prediction, our proposed model emphasizes on both feature improvement as well as feature learning to achieve a generalizable BigData analytics solution for PM2.5 prediction. More specifically, in this paper four Chinese city’s data (Chengdu, Guangzhou, Shenyang, and Shanghai) have been considered where each city possesses three monitoring stations providing spatiotemporal features like timestamp, wind-direction, wind-speed, temperature, dew, humidity, precipitation and corresponding PM2.5 concentration. To alleviate missing element problem, at first it performs data wrangling and missing element removal, which is then followed by clustering using K-Means algorithm. Unlike classical methods, where input spatiotemporal features are directly learnt, we clustered the non-zero instances or features for the different time-periods so as to make learning more efficient. Once clustering the dataset, we applied three different deep spatiotemporal learning models derived using deep Long- and Short-Term Memory (LSTM) architecture to perform PM2.5 prediction. The performance in terms of prediction results and allied mean square error exhibit that the proposed model performs superior over other existing techniques, including classical LSTM methods. Results confirm that the use of clustered features can yield more accurate performance than the random feature learning. The overall proposed model was implemented over Apache Spark platform, which makes it suitable for the decentralized computation or BigData analytics purposes.

查看原文本刊更多论文

基于k均值聚类驱动的PM2.5深度时空学习模型

本文提出了一种新的、鲁棒的聚类驱动的PM2.5浓度深度时空学习模型。与传统的PM2.5预测方法不同，我们提出的模型强调特征改进和特征学习，以实现PM2.5预测的可推广的大数据分析解决方案。更具体地说，本文考虑了四个中国城市(成都、广州、沈阳和上海)的数据，每个城市拥有三个监测站，提供时间戳、风向、风速、温度、露水、湿度、降水和相应的PM2.5浓度等时空特征。为了缓解缺失元素问题，首先进行数据争用和缺失元素移除，然后使用K-Means算法进行聚类。与传统方法直接学习输入的时空特征不同，我们对不同时间段的非零实例或特征进行聚类，从而提高学习效率。在对数据集进行聚类后，我们应用了三种不同的深度时空学习模型，这些模型使用深度长短期记忆(LSTM)架构来进行PM2.5预测。在预测结果和相关均方误差方面的性能表明，所提出的模型优于其他现有技术，包括经典LSTM方法。结果证实，使用聚类特征比随机特征学习能产生更准确的性能。整个模型是在Apache Spark平台上实现的，这使得它适合于去中心化计算或大数据分析的目的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE)

自引率

0.00%

发文量