Mostafa M Abdelmalek, Hatem Mahmoud, Hassan Shokry
{"title":"利用机器学习技术预测空气质量指数和空气污染。","authors":"Mostafa M Abdelmalek, Hatem Mahmoud, Hassan Shokry","doi":"10.1038/s41598-025-11260-y","DOIUrl":null,"url":null,"abstract":"<p><p>Air pollution constitutes a significant challenge for both public health and environmental sustainability. Pollutants like PM, O<sub>3</sub>, NO<sub>2</sub>, SO<sub>2</sub>, and CO cause serious health problems and ecological damage. This study utilizes five machine learning (ML) models, which are Gaussian Process Regression (GPR), Ensemble Regression (ER), Support Vector Machine (SVM), Regression Tree (RT), and Kernel Approximation Regression (KAR), which are developed and compared to predict the Air Quality Index (AQI). The publicly available historical air pollution dataset, collected from 1st January to 31st December 2022, was obtained from the online source titled 'A Real-time Dataset of Air Pollution Monitoring Generated Using IoT-Mendeley Data', developed by the Department of Software Engineering, Daffodil International University. While the dataset includes six pollutants (PM<sub>10</sub>, PM<sub>2.5</sub>, NO<sub>2</sub>, SO<sub>2</sub>, CO, and O<sub>3</sub>), only three-PM<sub>2.5</sub>, PM<sub>10</sub>, and CO-were selected for AQI prediction based on their higher feature importance as determined using the Random Forest technique. To streamline the time and cost consumed in measuring and analyzing these pollutants, the five ML models were employed to predict the AQI using only these three essential features. The findings reveal that GPR, ER, SVM, and RT ML models exhibited higher accuracy levels, achieving over 96% AQI prediction, whereas the KAR model was less accurate, with an accuracy of 82.36%. The comparative analysis revealed that the GPR model outperformed the other ML models with a minimum Root Mean Square Error (RMSE) of 0.87 and 1.219 during the training and testing, respectively. The findings highlight the value of ML in enhancing air quality prediction and monitoring, offering accurate tools for hourly data analysis and potential real-time application. Such tools can assist in devising more efficient air pollution control strategies, contributing to improved public health and environmental sustainability.</p>","PeriodicalId":21811,"journal":{"name":"Scientific Reports","volume":"15 1","pages":"25890"},"PeriodicalIF":3.8000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prognosis of air quality index and air pollution using machine learning techniques.\",\"authors\":\"Mostafa M Abdelmalek, Hatem Mahmoud, Hassan Shokry\",\"doi\":\"10.1038/s41598-025-11260-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Air pollution constitutes a significant challenge for both public health and environmental sustainability. Pollutants like PM, O<sub>3</sub>, NO<sub>2</sub>, SO<sub>2</sub>, and CO cause serious health problems and ecological damage. This study utilizes five machine learning (ML) models, which are Gaussian Process Regression (GPR), Ensemble Regression (ER), Support Vector Machine (SVM), Regression Tree (RT), and Kernel Approximation Regression (KAR), which are developed and compared to predict the Air Quality Index (AQI). The publicly available historical air pollution dataset, collected from 1st January to 31st December 2022, was obtained from the online source titled 'A Real-time Dataset of Air Pollution Monitoring Generated Using IoT-Mendeley Data', developed by the Department of Software Engineering, Daffodil International University. While the dataset includes six pollutants (PM<sub>10</sub>, PM<sub>2.5</sub>, NO<sub>2</sub>, SO<sub>2</sub>, CO, and O<sub>3</sub>), only three-PM<sub>2.5</sub>, PM<sub>10</sub>, and CO-were selected for AQI prediction based on their higher feature importance as determined using the Random Forest technique. To streamline the time and cost consumed in measuring and analyzing these pollutants, the five ML models were employed to predict the AQI using only these three essential features. The findings reveal that GPR, ER, SVM, and RT ML models exhibited higher accuracy levels, achieving over 96% AQI prediction, whereas the KAR model was less accurate, with an accuracy of 82.36%. The comparative analysis revealed that the GPR model outperformed the other ML models with a minimum Root Mean Square Error (RMSE) of 0.87 and 1.219 during the training and testing, respectively. The findings highlight the value of ML in enhancing air quality prediction and monitoring, offering accurate tools for hourly data analysis and potential real-time application. Such tools can assist in devising more efficient air pollution control strategies, contributing to improved public health and environmental sustainability.</p>\",\"PeriodicalId\":21811,\"journal\":{\"name\":\"Scientific Reports\",\"volume\":\"15 1\",\"pages\":\"25890\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific Reports\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1038/s41598-025-11260-y\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Reports","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41598-025-11260-y","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
Prognosis of air quality index and air pollution using machine learning techniques.
Air pollution constitutes a significant challenge for both public health and environmental sustainability. Pollutants like PM, O3, NO2, SO2, and CO cause serious health problems and ecological damage. This study utilizes five machine learning (ML) models, which are Gaussian Process Regression (GPR), Ensemble Regression (ER), Support Vector Machine (SVM), Regression Tree (RT), and Kernel Approximation Regression (KAR), which are developed and compared to predict the Air Quality Index (AQI). The publicly available historical air pollution dataset, collected from 1st January to 31st December 2022, was obtained from the online source titled 'A Real-time Dataset of Air Pollution Monitoring Generated Using IoT-Mendeley Data', developed by the Department of Software Engineering, Daffodil International University. While the dataset includes six pollutants (PM10, PM2.5, NO2, SO2, CO, and O3), only three-PM2.5, PM10, and CO-were selected for AQI prediction based on their higher feature importance as determined using the Random Forest technique. To streamline the time and cost consumed in measuring and analyzing these pollutants, the five ML models were employed to predict the AQI using only these three essential features. The findings reveal that GPR, ER, SVM, and RT ML models exhibited higher accuracy levels, achieving over 96% AQI prediction, whereas the KAR model was less accurate, with an accuracy of 82.36%. The comparative analysis revealed that the GPR model outperformed the other ML models with a minimum Root Mean Square Error (RMSE) of 0.87 and 1.219 during the training and testing, respectively. The findings highlight the value of ML in enhancing air quality prediction and monitoring, offering accurate tools for hourly data analysis and potential real-time application. Such tools can assist in devising more efficient air pollution control strategies, contributing to improved public health and environmental sustainability.
期刊介绍:
We publish original research from all areas of the natural sciences, psychology, medicine and engineering. You can learn more about what we publish by browsing our specific scientific subject areas below or explore Scientific Reports by browsing all articles and collections.
Scientific Reports has a 2-year impact factor: 4.380 (2021), and is the 6th most-cited journal in the world, with more than 540,000 citations in 2020 (Clarivate Analytics, 2021).
•Engineering
Engineering covers all aspects of engineering, technology, and applied science. It plays a crucial role in the development of technologies to address some of the world''s biggest challenges, helping to save lives and improve the way we live.
•Physical sciences
Physical sciences are those academic disciplines that aim to uncover the underlying laws of nature — often written in the language of mathematics. It is a collective term for areas of study including astronomy, chemistry, materials science and physics.
•Earth and environmental sciences
Earth and environmental sciences cover all aspects of Earth and planetary science and broadly encompass solid Earth processes, surface and atmospheric dynamics, Earth system history, climate and climate change, marine and freshwater systems, and ecology. It also considers the interactions between humans and these systems.
•Biological sciences
Biological sciences encompass all the divisions of natural sciences examining various aspects of vital processes. The concept includes anatomy, physiology, cell biology, biochemistry and biophysics, and covers all organisms from microorganisms, animals to plants.
•Health sciences
The health sciences study health, disease and healthcare. This field of study aims to develop knowledge, interventions and technology for use in healthcare to improve the treatment of patients.