Ismail Sulaimon, H. Alaka, Razak Olu-Ajayi, Mubashir Ahmad, Saheed Ajayi, Abdul Hye
{"title":"Effect of traffic data set on various machine-learning algorithms when forecasting air quality","authors":"Ismail Sulaimon, H. Alaka, Razak Olu-Ajayi, Mubashir Ahmad, Saheed Ajayi, Abdul Hye","doi":"10.1108/jedt-10-2021-0554","DOIUrl":null,"url":null,"abstract":"\nPurpose\nRoad traffic emissions are generally believed to contribute immensely to air pollution, but the effect of road traffic data sets on air quality (AQ) predictions has not been fully investigated. This paper aims to investigate the effects traffic data set have on the performance of machine learning (ML) predictive models in AQ prediction.\n\n\nDesign/methodology/approach\nTo achieve this, the authors have set up an experiment with the control data set having only the AQ data set and meteorological (Met) data set, while the experimental data set is made up of the AQ data set, Met data set and traffic data set. Several ML models (such as extra trees regressor, eXtreme gradient boosting regressor, random forest regressor, K-neighbors regressor and two others) were trained, tested and compared on these individual combinations of data sets to predict the volume of PM2.5, PM10, NO2 and O3 in the atmosphere at various times of the day.\n\n\nFindings\nThe result obtained showed that various ML algorithms react differently to the traffic data set despite generally contributing to the performance improvement of all the ML algorithms considered in this study by at least 20% and an error reduction of at least 18.97%.\n\n\nResearch limitations/implications\nThis research is limited in terms of the study area, and the result cannot be generalized outside of the UK as some of the inherent conditions may not be similar elsewhere. Additionally, only the ML algorithms commonly used in literature are considered in this research, therefore, leaving out a few other ML algorithms.\n\n\nPractical implications\nThis study reinforces the belief that the traffic data set has a significant effect on improving the performance of air pollution ML prediction models. Hence, there is an indication that ML algorithms behave differently when trained with a form of traffic data set in the development of an AQ prediction model. This implies that developers and researchers in AQ prediction need to identify the ML algorithms that behave in their best interest before implementation.\n\n\nOriginality/value\nThe result of this study will enable researchers to focus more on algorithms of benefit when using traffic data sets in AQ prediction.\n","PeriodicalId":46533,"journal":{"name":"Journal of Engineering Design and Technology","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2022-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Engineering Design and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/jedt-10-2021-0554","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 2
Abstract
Purpose
Road traffic emissions are generally believed to contribute immensely to air pollution, but the effect of road traffic data sets on air quality (AQ) predictions has not been fully investigated. This paper aims to investigate the effects traffic data set have on the performance of machine learning (ML) predictive models in AQ prediction.
Design/methodology/approach
To achieve this, the authors have set up an experiment with the control data set having only the AQ data set and meteorological (Met) data set, while the experimental data set is made up of the AQ data set, Met data set and traffic data set. Several ML models (such as extra trees regressor, eXtreme gradient boosting regressor, random forest regressor, K-neighbors regressor and two others) were trained, tested and compared on these individual combinations of data sets to predict the volume of PM2.5, PM10, NO2 and O3 in the atmosphere at various times of the day.
Findings
The result obtained showed that various ML algorithms react differently to the traffic data set despite generally contributing to the performance improvement of all the ML algorithms considered in this study by at least 20% and an error reduction of at least 18.97%.
Research limitations/implications
This research is limited in terms of the study area, and the result cannot be generalized outside of the UK as some of the inherent conditions may not be similar elsewhere. Additionally, only the ML algorithms commonly used in literature are considered in this research, therefore, leaving out a few other ML algorithms.
Practical implications
This study reinforces the belief that the traffic data set has a significant effect on improving the performance of air pollution ML prediction models. Hence, there is an indication that ML algorithms behave differently when trained with a form of traffic data set in the development of an AQ prediction model. This implies that developers and researchers in AQ prediction need to identify the ML algorithms that behave in their best interest before implementation.
Originality/value
The result of this study will enable researchers to focus more on algorithms of benefit when using traffic data sets in AQ prediction.
期刊介绍:
- Design strategies - Usability and adaptability - Material, component and systems performance - Process control - Alternative and new technologies - Organizational, management and research issues - Human factors - Environmental, quality and health and safety issues - Cost and life cycle issues - Sustainability criteria, indicators, measurement and practices - Risk management - Entrepreneurship Law, regulation and governance - Design, implementing, managing and practicing innovation - Visualization, simulation, information and communication technologies - Education practices, innovation, strategies and policy issues.