The Application of Machine Learning Models in the Prediction of PM2.5/PM10 Concentration

Proceedings of the 2021 4th International Conference on Computers in Management and Business Pub Date : 2021-01-30 DOI:10.1145/3450588.3450605

Xinzhi Lin

{"title":"The Application of Machine Learning Models in the Prediction of PM2.5/PM10 Concentration","authors":"Xinzhi Lin","doi":"10.1145/3450588.3450605","DOIUrl":null,"url":null,"abstract":"The current world economy and science are in an era of rapid development, and Beijing is experiencing chronic air pollution. The air quality is important to the travel of people, development of enterprise and normal operation of traffic. PM2.5 and PM10 are the main components which cause the air pollution, and it's very meaningful to predict their concentration in the air [1]. Although some traditional models (like basic linear regression) have been proposed to predict the content of PM2.5/PM10, the quantities of variables included to predict the concentration are few and it executes with low efficiency and low accuracy. In the big data era, it's necessary to build the model which can execute the big data kinds and sets. With the adequate data sets from different meteorological stations in Beijing, we can use the more abundant variables such as mass of SO2, NO2, wind direction and other weather observations to predict the content of PM2.5/PM10. We build the machine learning models with higher efficiency, accuracy and stronger learning ability, whose primary algorithms include: multiple linear regression, decision tree, boosting and random forest based on decision tree and neural network. The result demonstrates that the prediction effect of the models is based on neural network and ensemble learning. Boosting performs best among these models, which achieves R-square 84.2% and 75.7% on the test set for the PM2.5 and PM10, respectively.","PeriodicalId":150426,"journal":{"name":"Proceedings of the 2021 4th International Conference on Computers in Management and Business","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 4th International Conference on Computers in Management and Business","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3450588.3450605","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The current world economy and science are in an era of rapid development, and Beijing is experiencing chronic air pollution. The air quality is important to the travel of people, development of enterprise and normal operation of traffic. PM2.5 and PM10 are the main components which cause the air pollution, and it's very meaningful to predict their concentration in the air [1]. Although some traditional models (like basic linear regression) have been proposed to predict the content of PM2.5/PM10, the quantities of variables included to predict the concentration are few and it executes with low efficiency and low accuracy. In the big data era, it's necessary to build the model which can execute the big data kinds and sets. With the adequate data sets from different meteorological stations in Beijing, we can use the more abundant variables such as mass of SO2, NO2, wind direction and other weather observations to predict the content of PM2.5/PM10. We build the machine learning models with higher efficiency, accuracy and stronger learning ability, whose primary algorithms include: multiple linear regression, decision tree, boosting and random forest based on decision tree and neural network. The result demonstrates that the prediction effect of the models is based on neural network and ensemble learning. Boosting performs best among these models, which achieves R-square 84.2% and 75.7% on the test set for the PM2.5 and PM10, respectively.

查看原文本刊更多论文

机器学习模型在PM2.5/PM10浓度预测中的应用

当今世界经济和科学正处于快速发展的时代，而北京正经历着长期的空气污染。空气质量关系到人们的出行、企业的发展和交通的正常运行。PM2.5和PM10是造成大气污染的主要成分，对其在空气中的浓度进行预测具有重要意义[1]。虽然已经提出了一些传统的模型(如基本线性回归)来预测PM2.5/PM10的含量，但用于预测浓度的变量数量少，执行效率低，精度低。在大数据时代，有必要建立能够执行大数据种类和集合的模型。在北京市各气象站数据充足的情况下，我们可以利用SO2质量、NO2质量、风向等较为丰富的气象观测变量预测PM2.5/PM10的含量。我们构建了效率更高、精度更高、学习能力更强的机器学习模型，其主要算法包括:基于决策树和神经网络的多元线性回归、决策树、boosting和随机森林。结果表明，该模型的预测效果是基于神经网络和集成学习。在这些模型中，Boosting的表现最好，在PM2.5和PM10的测试集上分别达到了84.2%和75.7%的r方。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2021 4th International Conference on Computers in Management and Business

自引率

0.00%

发文量