使用大数据分析和物联网排放传感器进行细颗粒物污染预测的集合

IF 2.6 Q1 ENGINEERING, MULTIDISCIPLINARY
Christian Nnaemeka Egwim, Hafiz Alaka, Youlu Pan, Habeeb Balogun, Saheed Ajayi, Abdul Hye, Oluwapelumi Oluwaseun Egunjobi
{"title":"使用大数据分析和物联网排放传感器进行细颗粒物污染预测的集合","authors":"Christian Nnaemeka Egwim, Hafiz Alaka, Youlu Pan, Habeeb Balogun, Saheed Ajayi, Abdul Hye, Oluwapelumi Oluwaseun Egunjobi","doi":"10.1108/jedt-07-2022-0379","DOIUrl":null,"url":null,"abstract":"Purpose The study aims to develop a multilayer high-effective ensemble of ensembles predictive model (stacking ensemble) using several hyperparameter optimized ensemble machine learning (ML) methods (bagging and boosting ensembles) trained with high-volume data points retrieved from Internet of Things (IoT) emission sensors, time-corresponding meteorology and traffic data. Design/methodology/approach For a start, the study experimented big data hypothesis theory by developing sample ensemble predictive models on different data sample sizes and compared their results. Second, it developed a standalone model and several bagging and boosting ensemble models and compared their results. Finally, it used the best performing bagging and boosting predictive models as input estimators to develop a novel multilayer high-effective stacking ensemble predictive model. Findings Results proved data size to be one of the main determinants to ensemble ML predictive power. Second, it proved that, as compared to using a single algorithm, the cumulative result from ensemble ML algorithms is usually always better in terms of predicted accuracy. Finally, it proved stacking ensemble to be a better model for predicting PM 2.5 concentration level than bagging and boosting ensemble models. Research limitations/implications A limitation of this study is the trade-off between performance of this novel model and the computational time required to train it. Whether this gap can be closed remains an open research question. As a result, future research should attempt to close this gap. Also, future studies can integrate this novel model to a personal air quality messaging system to inform public of pollution levels and improve public access to air quality forecast. Practical implications The outcome of this study will aid the public to proactively identify highly polluted areas thus potentially reducing pollution-associated/ triggered COVID-19 (and other lung diseases) deaths/ complications/ transmission by encouraging avoidance behavior and support informed decision to lock down by government bodies when integrated into an air pollution monitoring system Originality/value This study fills a gap in literature by providing a justification for selecting appropriate ensemble ML algorithms for PM 2.5 concentration level predictive modeling. Second, it contributes to the big data hypothesis theory, which suggests that data size is one of the most important factors of ML predictive capability. Third, it supports the premise that when using ensemble ML algorithms, the cumulative output is usually always better in terms of predicted accuracy than using a single algorithm. Finally developing a novel multilayer high-performant hyperparameter optimized ensemble of ensembles predictive model that can accurately predict PM 2.5 concentration levels with improved model interpretability and enhanced generalizability, as well as the provision of a novel databank of historic pollution data from IoT emission sensors that can be purchased for research, consultancy and policymaking.","PeriodicalId":46533,"journal":{"name":"Journal of Engineering Design and Technology","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Ensemble of ensembles for fine particulate matter pollution prediction using big data analytics and IoT emission sensors\",\"authors\":\"Christian Nnaemeka Egwim, Hafiz Alaka, Youlu Pan, Habeeb Balogun, Saheed Ajayi, Abdul Hye, Oluwapelumi Oluwaseun Egunjobi\",\"doi\":\"10.1108/jedt-07-2022-0379\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose The study aims to develop a multilayer high-effective ensemble of ensembles predictive model (stacking ensemble) using several hyperparameter optimized ensemble machine learning (ML) methods (bagging and boosting ensembles) trained with high-volume data points retrieved from Internet of Things (IoT) emission sensors, time-corresponding meteorology and traffic data. Design/methodology/approach For a start, the study experimented big data hypothesis theory by developing sample ensemble predictive models on different data sample sizes and compared their results. Second, it developed a standalone model and several bagging and boosting ensemble models and compared their results. Finally, it used the best performing bagging and boosting predictive models as input estimators to develop a novel multilayer high-effective stacking ensemble predictive model. Findings Results proved data size to be one of the main determinants to ensemble ML predictive power. Second, it proved that, as compared to using a single algorithm, the cumulative result from ensemble ML algorithms is usually always better in terms of predicted accuracy. Finally, it proved stacking ensemble to be a better model for predicting PM 2.5 concentration level than bagging and boosting ensemble models. Research limitations/implications A limitation of this study is the trade-off between performance of this novel model and the computational time required to train it. Whether this gap can be closed remains an open research question. As a result, future research should attempt to close this gap. Also, future studies can integrate this novel model to a personal air quality messaging system to inform public of pollution levels and improve public access to air quality forecast. Practical implications The outcome of this study will aid the public to proactively identify highly polluted areas thus potentially reducing pollution-associated/ triggered COVID-19 (and other lung diseases) deaths/ complications/ transmission by encouraging avoidance behavior and support informed decision to lock down by government bodies when integrated into an air pollution monitoring system Originality/value This study fills a gap in literature by providing a justification for selecting appropriate ensemble ML algorithms for PM 2.5 concentration level predictive modeling. Second, it contributes to the big data hypothesis theory, which suggests that data size is one of the most important factors of ML predictive capability. Third, it supports the premise that when using ensemble ML algorithms, the cumulative output is usually always better in terms of predicted accuracy than using a single algorithm. Finally developing a novel multilayer high-performant hyperparameter optimized ensemble of ensembles predictive model that can accurately predict PM 2.5 concentration levels with improved model interpretability and enhanced generalizability, as well as the provision of a novel databank of historic pollution data from IoT emission sensors that can be purchased for research, consultancy and policymaking.\",\"PeriodicalId\":46533,\"journal\":{\"name\":\"Journal of Engineering Design and Technology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2023-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Engineering Design and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1108/jedt-07-2022-0379\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Engineering Design and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/jedt-07-2022-0379","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

该研究旨在开发多层高效集成预测模型(堆叠集成),使用几种超参数优化集成机器学习(ML)方法(装袋和提升集成),并使用从物联网(IoT)排放传感器、时间对应的气象和交通数据中检索的大量数据点进行训练。首先,本研究对大数据假设理论进行了实验,在不同的数据样本量上建立了样本集合预测模型,并比较了它们的结果。其次,它开发了一个独立模型和几个套袋和助推整体模型,并比较了它们的结果。最后,采用性能最好的套袋和助推预测模型作为输入估计器,建立了一种新型的多层高效叠加集成预测模型。结果证明数据大小是集合ML预测能力的主要决定因素之一。其次,它证明了,与使用单一算法相比,集成ML算法的累积结果通常在预测精度方面总是更好。结果表明,叠加系综预测pm2.5浓度水平优于套袋系综和提升系综。本研究的一个局限性是这种新模型的性能和训练它所需的计算时间之间的权衡。这一差距能否缩小仍是一个有待研究的问题。因此,未来的研究应该试图缩小这一差距。此外,未来的研究可以将这种新模型整合到个人空气质量信息系统中,告知公众污染水平,并改善公众获得空气质量预测的机会。本研究的结果将有助于公众主动识别高污染地区,从而通过鼓励回避行为,潜在地减少与污染相关/引发的COVID-19(和其他肺部疾病)的死亡/并发症/传播,并支持政府机构在纳入空气污染监测系统时做出明智的封锁决定。独创性/价值本研究通过提供选择的理由,填补了文献中的空白合适的集成ML算法用于PM 2.5浓度水平预测建模。其次,它有助于大数据假设理论,该理论认为数据大小是ML预测能力最重要的因素之一。第三,它支持这样一个前提,即当使用集成ML算法时,就预测精度而言,累积输出通常总是比使用单个算法更好。最后,开发一种新型的多层高性能超参数优化集成预测模型,该模型可以准确预测pm2.5浓度水平,提高模型的可解释性和增强的通用性,并提供一个新的物联网排放传感器历史污染数据数据库,可以购买用于研究、咨询和政策制定。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Ensemble of ensembles for fine particulate matter pollution prediction using big data analytics and IoT emission sensors
Purpose The study aims to develop a multilayer high-effective ensemble of ensembles predictive model (stacking ensemble) using several hyperparameter optimized ensemble machine learning (ML) methods (bagging and boosting ensembles) trained with high-volume data points retrieved from Internet of Things (IoT) emission sensors, time-corresponding meteorology and traffic data. Design/methodology/approach For a start, the study experimented big data hypothesis theory by developing sample ensemble predictive models on different data sample sizes and compared their results. Second, it developed a standalone model and several bagging and boosting ensemble models and compared their results. Finally, it used the best performing bagging and boosting predictive models as input estimators to develop a novel multilayer high-effective stacking ensemble predictive model. Findings Results proved data size to be one of the main determinants to ensemble ML predictive power. Second, it proved that, as compared to using a single algorithm, the cumulative result from ensemble ML algorithms is usually always better in terms of predicted accuracy. Finally, it proved stacking ensemble to be a better model for predicting PM 2.5 concentration level than bagging and boosting ensemble models. Research limitations/implications A limitation of this study is the trade-off between performance of this novel model and the computational time required to train it. Whether this gap can be closed remains an open research question. As a result, future research should attempt to close this gap. Also, future studies can integrate this novel model to a personal air quality messaging system to inform public of pollution levels and improve public access to air quality forecast. Practical implications The outcome of this study will aid the public to proactively identify highly polluted areas thus potentially reducing pollution-associated/ triggered COVID-19 (and other lung diseases) deaths/ complications/ transmission by encouraging avoidance behavior and support informed decision to lock down by government bodies when integrated into an air pollution monitoring system Originality/value This study fills a gap in literature by providing a justification for selecting appropriate ensemble ML algorithms for PM 2.5 concentration level predictive modeling. Second, it contributes to the big data hypothesis theory, which suggests that data size is one of the most important factors of ML predictive capability. Third, it supports the premise that when using ensemble ML algorithms, the cumulative output is usually always better in terms of predicted accuracy than using a single algorithm. Finally developing a novel multilayer high-performant hyperparameter optimized ensemble of ensembles predictive model that can accurately predict PM 2.5 concentration levels with improved model interpretability and enhanced generalizability, as well as the provision of a novel databank of historic pollution data from IoT emission sensors that can be purchased for research, consultancy and policymaking.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Engineering Design and Technology
Journal of Engineering Design and Technology ENGINEERING, MULTIDISCIPLINARY-
CiteScore
6.50
自引率
21.40%
发文量
67
期刊介绍: - Design strategies - Usability and adaptability - Material, component and systems performance - Process control - Alternative and new technologies - Organizational, management and research issues - Human factors - Environmental, quality and health and safety issues - Cost and life cycle issues - Sustainability criteria, indicators, measurement and practices - Risk management - Entrepreneurship Law, regulation and governance - Design, implementing, managing and practicing innovation - Visualization, simulation, information and communication technologies - Education practices, innovation, strategies and policy issues.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信