Lizhi Liu, Jingwei He, Bei Peng, Min Yang, Chenyue Zhang
{"title":"Research of PM2.5 Real-Time Prediction Model in Spark Cluster Environment","authors":"Lizhi Liu, Jingwei He, Bei Peng, Min Yang, Chenyue Zhang","doi":"10.1145/3366715.3366722","DOIUrl":null,"url":null,"abstract":"Big data technologies provide new ideas and means for statistical prediction of environmental air quality. In this paper, how to construct PM2.5 real-time prediction model for monitoring stations by using R language in Spark clusters is studied. Real-time data of monitoring stations stored in traditional relational database are converted into target dataset which can be put into cluster and processed by R language. The correlation analysis of pollutants and meteorological parameters that affecting the PM2.5 is carried out, so that to determine the input variables of multiple linear regression for constructing PM2.5 real-time prediction model. In spark cluster environment, Sparklyr and MLib packages are used by R language to construct prediction models for monitoring stations, each model is evaluated by four aspects such as residual analysis, significance detection, decision coefficient and test set prediction to justify its effectiveness. The experiment result shows that the model can be used to predict PM2.5 real-time value accurately.","PeriodicalId":425980,"journal":{"name":"Proceedings of the 2019 International Conference on Robotics Systems and Vehicle Technology - RSVT '19","volume":"263 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 International Conference on Robotics Systems and Vehicle Technology - RSVT '19","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3366715.3366722","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Big data technologies provide new ideas and means for statistical prediction of environmental air quality. In this paper, how to construct PM2.5 real-time prediction model for monitoring stations by using R language in Spark clusters is studied. Real-time data of monitoring stations stored in traditional relational database are converted into target dataset which can be put into cluster and processed by R language. The correlation analysis of pollutants and meteorological parameters that affecting the PM2.5 is carried out, so that to determine the input variables of multiple linear regression for constructing PM2.5 real-time prediction model. In spark cluster environment, Sparklyr and MLib packages are used by R language to construct prediction models for monitoring stations, each model is evaluated by four aspects such as residual analysis, significance detection, decision coefficient and test set prediction to justify its effectiveness. The experiment result shows that the model can be used to predict PM2.5 real-time value accurately.