Research of PM2.5 Real-Time Prediction Model in Spark Cluster Environment

Proceedings of the 2019 International Conference on Robotics Systems and Vehicle Technology - RSVT '19 Pub Date : 2019-10-16 DOI:10.1145/3366715.3366722

Lizhi Liu, Jingwei He, Bei Peng, Min Yang, Chenyue Zhang

{"title":"Research of PM2.5 Real-Time Prediction Model in Spark Cluster Environment","authors":"Lizhi Liu, Jingwei He, Bei Peng, Min Yang, Chenyue Zhang","doi":"10.1145/3366715.3366722","DOIUrl":null,"url":null,"abstract":"Big data technologies provide new ideas and means for statistical prediction of environmental air quality. In this paper, how to construct PM2.5 real-time prediction model for monitoring stations by using R language in Spark clusters is studied. Real-time data of monitoring stations stored in traditional relational database are converted into target dataset which can be put into cluster and processed by R language. The correlation analysis of pollutants and meteorological parameters that affecting the PM2.5 is carried out, so that to determine the input variables of multiple linear regression for constructing PM2.5 real-time prediction model. In spark cluster environment, Sparklyr and MLib packages are used by R language to construct prediction models for monitoring stations, each model is evaluated by four aspects such as residual analysis, significance detection, decision coefficient and test set prediction to justify its effectiveness. The experiment result shows that the model can be used to predict PM2.5 real-time value accurately.","PeriodicalId":425980,"journal":{"name":"Proceedings of the 2019 International Conference on Robotics Systems and Vehicle Technology - RSVT '19","volume":"263 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 International Conference on Robotics Systems and Vehicle Technology - RSVT '19","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3366715.3366722","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Big data technologies provide new ideas and means for statistical prediction of environmental air quality. In this paper, how to construct PM2.5 real-time prediction model for monitoring stations by using R language in Spark clusters is studied. Real-time data of monitoring stations stored in traditional relational database are converted into target dataset which can be put into cluster and processed by R language. The correlation analysis of pollutants and meteorological parameters that affecting the PM2.5 is carried out, so that to determine the input variables of multiple linear regression for constructing PM2.5 real-time prediction model. In spark cluster environment, Sparklyr and MLib packages are used by R language to construct prediction models for monitoring stations, each model is evaluated by four aspects such as residual analysis, significance detection, decision coefficient and test set prediction to justify its effectiveness. The experiment result shows that the model can be used to predict PM2.5 real-time value accurately.

查看原文本刊更多论文

星火簇环境下PM2.5实时预测模型研究

大数据技术为环境空气质量统计预测提供了新的思路和手段。本文研究了如何在Spark集群中使用R语言构建PM2.5监测站实时预测模型。将存储在传统关系型数据库中的监测站实时数据转换为目标数据集，目标数据集可以放到集群中，用R语言进行处理。对影响PM2.5的污染物与气象参数进行相关性分析，确定多元线性回归的输入变量，构建PM2.5实时预测模型。在spark集群环境下，利用R语言使用Sparklyr和MLib包构建监测站预测模型，并从残差分析、显著性检测、决策系数和测试集预测四个方面对每个模型进行评价，验证其有效性。实验结果表明，该模型能较准确地预测PM2.5的实时值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2019 International Conference on Robotics Systems and Vehicle Technology - RSVT '19

自引率

0.00%

发文量