Lizhi Liu, Jingwei He, Bei Peng, Min Yang, Chenyue Zhang
{"title":"星火簇环境下PM2.5实时预测模型研究","authors":"Lizhi Liu, Jingwei He, Bei Peng, Min Yang, Chenyue Zhang","doi":"10.1145/3366715.3366722","DOIUrl":null,"url":null,"abstract":"Big data technologies provide new ideas and means for statistical prediction of environmental air quality. In this paper, how to construct PM2.5 real-time prediction model for monitoring stations by using R language in Spark clusters is studied. Real-time data of monitoring stations stored in traditional relational database are converted into target dataset which can be put into cluster and processed by R language. The correlation analysis of pollutants and meteorological parameters that affecting the PM2.5 is carried out, so that to determine the input variables of multiple linear regression for constructing PM2.5 real-time prediction model. In spark cluster environment, Sparklyr and MLib packages are used by R language to construct prediction models for monitoring stations, each model is evaluated by four aspects such as residual analysis, significance detection, decision coefficient and test set prediction to justify its effectiveness. The experiment result shows that the model can be used to predict PM2.5 real-time value accurately.","PeriodicalId":425980,"journal":{"name":"Proceedings of the 2019 International Conference on Robotics Systems and Vehicle Technology - RSVT '19","volume":"263 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research of PM2.5 Real-Time Prediction Model in Spark Cluster Environment\",\"authors\":\"Lizhi Liu, Jingwei He, Bei Peng, Min Yang, Chenyue Zhang\",\"doi\":\"10.1145/3366715.3366722\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big data technologies provide new ideas and means for statistical prediction of environmental air quality. In this paper, how to construct PM2.5 real-time prediction model for monitoring stations by using R language in Spark clusters is studied. Real-time data of monitoring stations stored in traditional relational database are converted into target dataset which can be put into cluster and processed by R language. The correlation analysis of pollutants and meteorological parameters that affecting the PM2.5 is carried out, so that to determine the input variables of multiple linear regression for constructing PM2.5 real-time prediction model. In spark cluster environment, Sparklyr and MLib packages are used by R language to construct prediction models for monitoring stations, each model is evaluated by four aspects such as residual analysis, significance detection, decision coefficient and test set prediction to justify its effectiveness. The experiment result shows that the model can be used to predict PM2.5 real-time value accurately.\",\"PeriodicalId\":425980,\"journal\":{\"name\":\"Proceedings of the 2019 International Conference on Robotics Systems and Vehicle Technology - RSVT '19\",\"volume\":\"263 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2019 International Conference on Robotics Systems and Vehicle Technology - RSVT '19\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3366715.3366722\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 International Conference on Robotics Systems and Vehicle Technology - RSVT '19","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3366715.3366722","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Research of PM2.5 Real-Time Prediction Model in Spark Cluster Environment
Big data technologies provide new ideas and means for statistical prediction of environmental air quality. In this paper, how to construct PM2.5 real-time prediction model for monitoring stations by using R language in Spark clusters is studied. Real-time data of monitoring stations stored in traditional relational database are converted into target dataset which can be put into cluster and processed by R language. The correlation analysis of pollutants and meteorological parameters that affecting the PM2.5 is carried out, so that to determine the input variables of multiple linear regression for constructing PM2.5 real-time prediction model. In spark cluster environment, Sparklyr and MLib packages are used by R language to construct prediction models for monitoring stations, each model is evaluated by four aspects such as residual analysis, significance detection, decision coefficient and test set prediction to justify its effectiveness. The experiment result shows that the model can be used to predict PM2.5 real-time value accurately.