Research of PM2.5 Real-Time Prediction Model in Spark Cluster Environment

Lizhi Liu, Jingwei He, Bei Peng, Min Yang, Chenyue Zhang
{"title":"Research of PM2.5 Real-Time Prediction Model in Spark Cluster Environment","authors":"Lizhi Liu, Jingwei He, Bei Peng, Min Yang, Chenyue Zhang","doi":"10.1145/3366715.3366722","DOIUrl":null,"url":null,"abstract":"Big data technologies provide new ideas and means for statistical prediction of environmental air quality. In this paper, how to construct PM2.5 real-time prediction model for monitoring stations by using R language in Spark clusters is studied. Real-time data of monitoring stations stored in traditional relational database are converted into target dataset which can be put into cluster and processed by R language. The correlation analysis of pollutants and meteorological parameters that affecting the PM2.5 is carried out, so that to determine the input variables of multiple linear regression for constructing PM2.5 real-time prediction model. In spark cluster environment, Sparklyr and MLib packages are used by R language to construct prediction models for monitoring stations, each model is evaluated by four aspects such as residual analysis, significance detection, decision coefficient and test set prediction to justify its effectiveness. The experiment result shows that the model can be used to predict PM2.5 real-time value accurately.","PeriodicalId":425980,"journal":{"name":"Proceedings of the 2019 International Conference on Robotics Systems and Vehicle Technology - RSVT '19","volume":"263 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 International Conference on Robotics Systems and Vehicle Technology - RSVT '19","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3366715.3366722","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Big data technologies provide new ideas and means for statistical prediction of environmental air quality. In this paper, how to construct PM2.5 real-time prediction model for monitoring stations by using R language in Spark clusters is studied. Real-time data of monitoring stations stored in traditional relational database are converted into target dataset which can be put into cluster and processed by R language. The correlation analysis of pollutants and meteorological parameters that affecting the PM2.5 is carried out, so that to determine the input variables of multiple linear regression for constructing PM2.5 real-time prediction model. In spark cluster environment, Sparklyr and MLib packages are used by R language to construct prediction models for monitoring stations, each model is evaluated by four aspects such as residual analysis, significance detection, decision coefficient and test set prediction to justify its effectiveness. The experiment result shows that the model can be used to predict PM2.5 real-time value accurately.
星火簇环境下PM2.5实时预测模型研究
大数据技术为环境空气质量统计预测提供了新的思路和手段。本文研究了如何在Spark集群中使用R语言构建PM2.5监测站实时预测模型。将存储在传统关系型数据库中的监测站实时数据转换为目标数据集,目标数据集可以放到集群中,用R语言进行处理。对影响PM2.5的污染物与气象参数进行相关性分析,确定多元线性回归的输入变量,构建PM2.5实时预测模型。在spark集群环境下,利用R语言使用Sparklyr和MLib包构建监测站预测模型,并从残差分析、显著性检测、决策系数和测试集预测四个方面对每个模型进行评价,验证其有效性。实验结果表明,该模型能较准确地预测PM2.5的实时值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信