Effect of traffic data set on various machine-learning algorithms when forecasting air quality

IF 2.7 Q1 ENGINEERING, MULTIDISCIPLINARY

Journal of Engineering Design and Technology Pub Date : 2022-05-26 DOI:10.1108/jedt-10-2021-0554

Ismail Sulaimon, H. Alaka, Razak Olu-Ajayi, Mubashir Ahmad, Saheed Ajayi, Abdul Hye

{"title":"Effect of traffic data set on various machine-learning algorithms when forecasting air quality","authors":"Ismail Sulaimon, H. Alaka, Razak Olu-Ajayi, Mubashir Ahmad, Saheed Ajayi, Abdul Hye","doi":"10.1108/jedt-10-2021-0554","DOIUrl":null,"url":null,"abstract":"\nPurpose\nRoad traffic emissions are generally believed to contribute immensely to air pollution, but the effect of road traffic data sets on air quality (AQ) predictions has not been fully investigated. This paper aims to investigate the effects traffic data set have on the performance of machine learning (ML) predictive models in AQ prediction.\n\n\nDesign/methodology/approach\nTo achieve this, the authors have set up an experiment with the control data set having only the AQ data set and meteorological (Met) data set, while the experimental data set is made up of the AQ data set, Met data set and traffic data set. Several ML models (such as extra trees regressor, eXtreme gradient boosting regressor, random forest regressor, K-neighbors regressor and two others) were trained, tested and compared on these individual combinations of data sets to predict the volume of PM2.5, PM10, NO2 and O3 in the atmosphere at various times of the day.\n\n\nFindings\nThe result obtained showed that various ML algorithms react differently to the traffic data set despite generally contributing to the performance improvement of all the ML algorithms considered in this study by at least 20% and an error reduction of at least 18.97%.\n\n\nResearch limitations/implications\nThis research is limited in terms of the study area, and the result cannot be generalized outside of the UK as some of the inherent conditions may not be similar elsewhere. Additionally, only the ML algorithms commonly used in literature are considered in this research, therefore, leaving out a few other ML algorithms.\n\n\nPractical implications\nThis study reinforces the belief that the traffic data set has a significant effect on improving the performance of air pollution ML prediction models. Hence, there is an indication that ML algorithms behave differently when trained with a form of traffic data set in the development of an AQ prediction model. This implies that developers and researchers in AQ prediction need to identify the ML algorithms that behave in their best interest before implementation.\n\n\nOriginality/value\nThe result of this study will enable researchers to focus more on algorithms of benefit when using traffic data sets in AQ prediction.\n","PeriodicalId":46533,"journal":{"name":"Journal of Engineering Design and Technology","volume":" ","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2022-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Engineering Design and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/jedt-10-2021-0554","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 2

Abstract

Purpose Road traffic emissions are generally believed to contribute immensely to air pollution, but the effect of road traffic data sets on air quality (AQ) predictions has not been fully investigated. This paper aims to investigate the effects traffic data set have on the performance of machine learning (ML) predictive models in AQ prediction. Design/methodology/approach To achieve this, the authors have set up an experiment with the control data set having only the AQ data set and meteorological (Met) data set, while the experimental data set is made up of the AQ data set, Met data set and traffic data set. Several ML models (such as extra trees regressor, eXtreme gradient boosting regressor, random forest regressor, K-neighbors regressor and two others) were trained, tested and compared on these individual combinations of data sets to predict the volume of PM2.5, PM10, NO2 and O3 in the atmosphere at various times of the day. Findings The result obtained showed that various ML algorithms react differently to the traffic data set despite generally contributing to the performance improvement of all the ML algorithms considered in this study by at least 20% and an error reduction of at least 18.97%. Research limitations/implications This research is limited in terms of the study area, and the result cannot be generalized outside of the UK as some of the inherent conditions may not be similar elsewhere. Additionally, only the ML algorithms commonly used in literature are considered in this research, therefore, leaving out a few other ML algorithms. Practical implications This study reinforces the belief that the traffic data set has a significant effect on improving the performance of air pollution ML prediction models. Hence, there is an indication that ML algorithms behave differently when trained with a form of traffic data set in the development of an AQ prediction model. This implies that developers and researchers in AQ prediction need to identify the ML algorithms that behave in their best interest before implementation. Originality/value The result of this study will enable researchers to focus more on algorithms of benefit when using traffic data sets in AQ prediction.

查看原文本刊更多论文

交通数据集在预测空气质量时对各种机器学习算法的影响

目的道路交通排放通常被认为对空气污染有很大影响，但道路交通数据集对空气质量（AQ）预测的影响尚未得到充分调查。本文旨在研究交通数据集对AQ预测中机器学习（ML）预测模型性能的影响。设计/方法/方法为了实现这一点，作者建立了一个实验，控制数据集只有AQ数据集和气象（Met）数据集，而实验数据集由AQ数据集合、Met数据集合和交通数据集合组成。在这些单独的数据集组合上训练、测试和比较了几个ML模型（如额外树木回归器、极限梯度增强回归器、随机森林回归器、K-邻居回归器和其他两个模型），以预测一天中不同时间大气中PM2.5、PM10、NO2和O3的体积。发现所获得的结果表明，尽管通常有助于本研究中考虑的所有ML算法的性能提高至少20%，误差减少至少18.97%，但各种ML算法对交通数据集的反应不同。研究局限性/含义本研究在研究领域有限，并且该结果不能在英国以外推广，因为一些固有条件在其他地方可能不相似。此外，本研究只考虑了文献中常用的ML算法，因此省略了其他一些ML算法。实际意义本研究强化了交通数据集在提高空气污染ML预测模型性能方面具有显著作用的信念。因此，有迹象表明，当在AQ预测模型的开发中使用交通数据集的形式进行训练时，ML算法表现不同。这意味着AQ预测的开发人员和研究人员需要在实现之前确定最符合他们利益的ML算法。独创性/价值这项研究的结果将使研究人员在AQ预测中使用交通数据集时能够更多地关注有益的算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Engineering Design and Technology ENGINEERING, MULTIDISCIPLINARY-

CiteScore

6.50

自引率

21.40%

发文量

期刊介绍： - Design strategies - Usability and adaptability - Material, component and systems performance - Process control - Alternative and new technologies - Organizational, management and research issues - Human factors - Environmental, quality and health and safety issues - Cost and life cycle issues - Sustainability criteria, indicators, measurement and practices - Risk management - Entrepreneurship Law, regulation and governance - Design, implementing, managing and practicing innovation - Visualization, simulation, information and communication technologies - Education practices, innovation, strategies and policy issues.