The Application of Machine Learning Models in the Prediction of PM2.5/PM10 Concentration

Xinzhi Lin
{"title":"The Application of Machine Learning Models in the Prediction of PM2.5/PM10 Concentration","authors":"Xinzhi Lin","doi":"10.1145/3450588.3450605","DOIUrl":null,"url":null,"abstract":"The current world economy and science are in an era of rapid development, and Beijing is experiencing chronic air pollution. The air quality is important to the travel of people, development of enterprise and normal operation of traffic. PM2.5 and PM10 are the main components which cause the air pollution, and it's very meaningful to predict their concentration in the air [1]. Although some traditional models (like basic linear regression) have been proposed to predict the content of PM2.5/PM10, the quantities of variables included to predict the concentration are few and it executes with low efficiency and low accuracy. In the big data era, it's necessary to build the model which can execute the big data kinds and sets. With the adequate data sets from different meteorological stations in Beijing, we can use the more abundant variables such as mass of SO2, NO2, wind direction and other weather observations to predict the content of PM2.5/PM10. We build the machine learning models with higher efficiency, accuracy and stronger learning ability, whose primary algorithms include: multiple linear regression, decision tree, boosting and random forest based on decision tree and neural network. The result demonstrates that the prediction effect of the models is based on neural network and ensemble learning. Boosting performs best among these models, which achieves R-square 84.2% and 75.7% on the test set for the PM2.5 and PM10, respectively.","PeriodicalId":150426,"journal":{"name":"Proceedings of the 2021 4th International Conference on Computers in Management and Business","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 4th International Conference on Computers in Management and Business","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3450588.3450605","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The current world economy and science are in an era of rapid development, and Beijing is experiencing chronic air pollution. The air quality is important to the travel of people, development of enterprise and normal operation of traffic. PM2.5 and PM10 are the main components which cause the air pollution, and it's very meaningful to predict their concentration in the air [1]. Although some traditional models (like basic linear regression) have been proposed to predict the content of PM2.5/PM10, the quantities of variables included to predict the concentration are few and it executes with low efficiency and low accuracy. In the big data era, it's necessary to build the model which can execute the big data kinds and sets. With the adequate data sets from different meteorological stations in Beijing, we can use the more abundant variables such as mass of SO2, NO2, wind direction and other weather observations to predict the content of PM2.5/PM10. We build the machine learning models with higher efficiency, accuracy and stronger learning ability, whose primary algorithms include: multiple linear regression, decision tree, boosting and random forest based on decision tree and neural network. The result demonstrates that the prediction effect of the models is based on neural network and ensemble learning. Boosting performs best among these models, which achieves R-square 84.2% and 75.7% on the test set for the PM2.5 and PM10, respectively.
机器学习模型在PM2.5/PM10浓度预测中的应用
当今世界经济和科学正处于快速发展的时代,而北京正经历着长期的空气污染。空气质量关系到人们的出行、企业的发展和交通的正常运行。PM2.5和PM10是造成大气污染的主要成分,对其在空气中的浓度进行预测具有重要意义[1]。虽然已经提出了一些传统的模型(如基本线性回归)来预测PM2.5/PM10的含量,但用于预测浓度的变量数量少,执行效率低,精度低。在大数据时代,有必要建立能够执行大数据种类和集合的模型。在北京市各气象站数据充足的情况下,我们可以利用SO2质量、NO2质量、风向等较为丰富的气象观测变量预测PM2.5/PM10的含量。我们构建了效率更高、精度更高、学习能力更强的机器学习模型,其主要算法包括:基于决策树和神经网络的多元线性回归、决策树、boosting和随机森林。结果表明,该模型的预测效果是基于神经网络和集成学习。在这些模型中,Boosting的表现最好,在PM2.5和PM10的测试集上分别达到了84.2%和75.7%的r方。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信