利用机器学习技术预测空气质量指数和空气污染。

IF 3.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Mostafa M Abdelmalek, Hatem Mahmoud, Hassan Shokry
{"title":"利用机器学习技术预测空气质量指数和空气污染。","authors":"Mostafa M Abdelmalek, Hatem Mahmoud, Hassan Shokry","doi":"10.1038/s41598-025-11260-y","DOIUrl":null,"url":null,"abstract":"<p><p>Air pollution constitutes a significant challenge for both public health and environmental sustainability. Pollutants like PM, O<sub>3</sub>, NO<sub>2</sub>, SO<sub>2</sub>, and CO cause serious health problems and ecological damage. This study utilizes five machine learning (ML) models, which are Gaussian Process Regression (GPR), Ensemble Regression (ER), Support Vector Machine (SVM), Regression Tree (RT), and Kernel Approximation Regression (KAR), which are developed and compared to predict the Air Quality Index (AQI). The publicly available historical air pollution dataset, collected from 1st January to 31st December 2022, was obtained from the online source titled 'A Real-time Dataset of Air Pollution Monitoring Generated Using IoT-Mendeley Data', developed by the Department of Software Engineering, Daffodil International University. While the dataset includes six pollutants (PM<sub>10</sub>, PM<sub>2.5</sub>, NO<sub>2</sub>, SO<sub>2</sub>, CO, and O<sub>3</sub>), only three-PM<sub>2.5</sub>, PM<sub>10</sub>, and CO-were selected for AQI prediction based on their higher feature importance as determined using the Random Forest technique. To streamline the time and cost consumed in measuring and analyzing these pollutants, the five ML models were employed to predict the AQI using only these three essential features. The findings reveal that GPR, ER, SVM, and RT ML models exhibited higher accuracy levels, achieving over 96% AQI prediction, whereas the KAR model was less accurate, with an accuracy of 82.36%. The comparative analysis revealed that the GPR model outperformed the other ML models with a minimum Root Mean Square Error (RMSE) of 0.87 and 1.219 during the training and testing, respectively. The findings highlight the value of ML in enhancing air quality prediction and monitoring, offering accurate tools for hourly data analysis and potential real-time application. Such tools can assist in devising more efficient air pollution control strategies, contributing to improved public health and environmental sustainability.</p>","PeriodicalId":21811,"journal":{"name":"Scientific Reports","volume":"15 1","pages":"25890"},"PeriodicalIF":3.8000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prognosis of air quality index and air pollution using machine learning techniques.\",\"authors\":\"Mostafa M Abdelmalek, Hatem Mahmoud, Hassan Shokry\",\"doi\":\"10.1038/s41598-025-11260-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Air pollution constitutes a significant challenge for both public health and environmental sustainability. Pollutants like PM, O<sub>3</sub>, NO<sub>2</sub>, SO<sub>2</sub>, and CO cause serious health problems and ecological damage. This study utilizes five machine learning (ML) models, which are Gaussian Process Regression (GPR), Ensemble Regression (ER), Support Vector Machine (SVM), Regression Tree (RT), and Kernel Approximation Regression (KAR), which are developed and compared to predict the Air Quality Index (AQI). The publicly available historical air pollution dataset, collected from 1st January to 31st December 2022, was obtained from the online source titled 'A Real-time Dataset of Air Pollution Monitoring Generated Using IoT-Mendeley Data', developed by the Department of Software Engineering, Daffodil International University. While the dataset includes six pollutants (PM<sub>10</sub>, PM<sub>2.5</sub>, NO<sub>2</sub>, SO<sub>2</sub>, CO, and O<sub>3</sub>), only three-PM<sub>2.5</sub>, PM<sub>10</sub>, and CO-were selected for AQI prediction based on their higher feature importance as determined using the Random Forest technique. To streamline the time and cost consumed in measuring and analyzing these pollutants, the five ML models were employed to predict the AQI using only these three essential features. The findings reveal that GPR, ER, SVM, and RT ML models exhibited higher accuracy levels, achieving over 96% AQI prediction, whereas the KAR model was less accurate, with an accuracy of 82.36%. The comparative analysis revealed that the GPR model outperformed the other ML models with a minimum Root Mean Square Error (RMSE) of 0.87 and 1.219 during the training and testing, respectively. The findings highlight the value of ML in enhancing air quality prediction and monitoring, offering accurate tools for hourly data analysis and potential real-time application. Such tools can assist in devising more efficient air pollution control strategies, contributing to improved public health and environmental sustainability.</p>\",\"PeriodicalId\":21811,\"journal\":{\"name\":\"Scientific Reports\",\"volume\":\"15 1\",\"pages\":\"25890\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific Reports\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1038/s41598-025-11260-y\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Reports","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41598-025-11260-y","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

空气污染对公众健康和环境可持续性构成重大挑战。PM、O3、NO2、SO2和CO等污染物会造成严重的健康问题和生态破坏。本研究利用高斯过程回归(GPR)、集成回归(ER)、支持向量机(SVM)、回归树(RT)和核近似回归(KAR)五种机器学习(ML)模型,开发并比较了预测空气质量指数(AQI)的方法。公开的历史空气污染数据集收集于2022年1月1日至12月31日,从名为“使用物联网门德利数据生成的空气污染监测实时数据集”的在线资源中获得,该数据集由水仙花国际大学软件工程系开发。虽然数据集包括六种污染物(PM10、PM2.5、NO2、SO2、CO和O3),但基于使用随机森林技术确定的更高特征重要性,仅选择PM2.5、PM10和CO这三种污染物进行AQI预测。为了简化测量和分析这些污染物所消耗的时间和成本,使用五个ML模型仅使用这三个基本特征来预测AQI。结果表明,GPR、ER、SVM和RT ML模型的AQI预测准确率较高,达到96%以上,而KAR模型的AQI预测准确率较低,仅为82.36%。对比分析表明,GPR模型在训练和测试时的最小均方根误差(RMSE)分别为0.87和1.219,优于其他ML模型。研究结果强调了机器学习在加强空气质量预测和监测方面的价值,为每小时数据分析和潜在的实时应用提供了准确的工具。这些工具可协助制定更有效的空气污染控制战略,有助于改善公众健康和环境的可持续性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Prognosis of air quality index and air pollution using machine learning techniques.

Air pollution constitutes a significant challenge for both public health and environmental sustainability. Pollutants like PM, O3, NO2, SO2, and CO cause serious health problems and ecological damage. This study utilizes five machine learning (ML) models, which are Gaussian Process Regression (GPR), Ensemble Regression (ER), Support Vector Machine (SVM), Regression Tree (RT), and Kernel Approximation Regression (KAR), which are developed and compared to predict the Air Quality Index (AQI). The publicly available historical air pollution dataset, collected from 1st January to 31st December 2022, was obtained from the online source titled 'A Real-time Dataset of Air Pollution Monitoring Generated Using IoT-Mendeley Data', developed by the Department of Software Engineering, Daffodil International University. While the dataset includes six pollutants (PM10, PM2.5, NO2, SO2, CO, and O3), only three-PM2.5, PM10, and CO-were selected for AQI prediction based on their higher feature importance as determined using the Random Forest technique. To streamline the time and cost consumed in measuring and analyzing these pollutants, the five ML models were employed to predict the AQI using only these three essential features. The findings reveal that GPR, ER, SVM, and RT ML models exhibited higher accuracy levels, achieving over 96% AQI prediction, whereas the KAR model was less accurate, with an accuracy of 82.36%. The comparative analysis revealed that the GPR model outperformed the other ML models with a minimum Root Mean Square Error (RMSE) of 0.87 and 1.219 during the training and testing, respectively. The findings highlight the value of ML in enhancing air quality prediction and monitoring, offering accurate tools for hourly data analysis and potential real-time application. Such tools can assist in devising more efficient air pollution control strategies, contributing to improved public health and environmental sustainability.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Scientific Reports
Scientific Reports Natural Science Disciplines-
CiteScore
7.50
自引率
4.30%
发文量
19567
审稿时长
3.9 months
期刊介绍: We publish original research from all areas of the natural sciences, psychology, medicine and engineering. You can learn more about what we publish by browsing our specific scientific subject areas below or explore Scientific Reports by browsing all articles and collections. Scientific Reports has a 2-year impact factor: 4.380 (2021), and is the 6th most-cited journal in the world, with more than 540,000 citations in 2020 (Clarivate Analytics, 2021). •Engineering Engineering covers all aspects of engineering, technology, and applied science. It plays a crucial role in the development of technologies to address some of the world''s biggest challenges, helping to save lives and improve the way we live. •Physical sciences Physical sciences are those academic disciplines that aim to uncover the underlying laws of nature — often written in the language of mathematics. It is a collective term for areas of study including astronomy, chemistry, materials science and physics. •Earth and environmental sciences Earth and environmental sciences cover all aspects of Earth and planetary science and broadly encompass solid Earth processes, surface and atmospheric dynamics, Earth system history, climate and climate change, marine and freshwater systems, and ecology. It also considers the interactions between humans and these systems. •Biological sciences Biological sciences encompass all the divisions of natural sciences examining various aspects of vital processes. The concept includes anatomy, physiology, cell biology, biochemistry and biophysics, and covers all organisms from microorganisms, animals to plants. •Health sciences The health sciences study health, disease and healthcare. This field of study aims to develop knowledge, interventions and technology for use in healthcare to improve the treatment of patients.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信