Myeong-Gyun Kim , Se-Young Kim , Kwangyul Lee , Pilho Kim , Hyoseon Kim , Hyo-Jong Song
{"title":"利用机器学习和可解释的人工智能开发和解释首都圈PM2.5估算模型","authors":"Myeong-Gyun Kim , Se-Young Kim , Kwangyul Lee , Pilho Kim , Hyoseon Kim , Hyo-Jong Song","doi":"10.1016/j.apr.2025.102672","DOIUrl":null,"url":null,"abstract":"<div><div>PM<sub>2.5</sub> is emitted and formed in the atmosphere through various factors, posing significant health risks to humans. Therefore, accurately estimating PM<sub>2.5</sub> concentrations and analyzing the contributions of individual factors are crucial. A Deep Neural Network (DNN) model was developed for PM<sub>2.5</sub> estimation in the Seoul Metropolitan Area in South Korea, while some machine learning models—Random Forest and Extreme Gradient Boosting—were also built for performance comparison. Among these, the DNN model demonstrated the best performance, with an R<sup>2</sup> of 0.95, MSE of 12.14, and MAE of 2.6. Based on this, Explainable Artificial Intelligence (XAI) techniques, including Vanilla Gradient and Shapley Additive Explanation (SHAP), were applied to interpret the PM<sub>2.5</sub> estimation model and analyze the contribution of each factor. The contribution analysis for the Seoul Metropolitan Area revealed that NO<sub>3</sub><sup>−</sup> and NH<sub>4</sub><sup>+</sup> had the highest contributions to PM<sub>2.5</sub> formation, indicating that secondary formation mechanisms play a dominant role. Furthermore, at high concentrations, the contributions of NO<sub>3</sub><sup>−</sup>, NH<sub>4</sub><sup>+</sup>, and SO<sub>4</sub><sup>2−</sup> were the highest, and the contributions of metal components and PM<sub>10</sub> were higher than the average. In particular, it was observed that NH<sub>4</sub><sup>+</sup> and K showed a positive correlation with PM<sub>2.5</sub> formation. Future research will focus on refining the model through clustering-based approaches and other enhancements, aiming to deepen the understanding of PM<sub>2.5</sub> formation patterns and provide meaningful insights for policymaking.</div></div>","PeriodicalId":8604,"journal":{"name":"Atmospheric Pollution Research","volume":"16 11","pages":"Article 102672"},"PeriodicalIF":3.5000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development and interpretation of PM2.5 estimation model for the Seoul Metropolitan Area using machine learning and explainable AI\",\"authors\":\"Myeong-Gyun Kim , Se-Young Kim , Kwangyul Lee , Pilho Kim , Hyoseon Kim , Hyo-Jong Song\",\"doi\":\"10.1016/j.apr.2025.102672\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>PM<sub>2.5</sub> is emitted and formed in the atmosphere through various factors, posing significant health risks to humans. Therefore, accurately estimating PM<sub>2.5</sub> concentrations and analyzing the contributions of individual factors are crucial. A Deep Neural Network (DNN) model was developed for PM<sub>2.5</sub> estimation in the Seoul Metropolitan Area in South Korea, while some machine learning models—Random Forest and Extreme Gradient Boosting—were also built for performance comparison. Among these, the DNN model demonstrated the best performance, with an R<sup>2</sup> of 0.95, MSE of 12.14, and MAE of 2.6. Based on this, Explainable Artificial Intelligence (XAI) techniques, including Vanilla Gradient and Shapley Additive Explanation (SHAP), were applied to interpret the PM<sub>2.5</sub> estimation model and analyze the contribution of each factor. The contribution analysis for the Seoul Metropolitan Area revealed that NO<sub>3</sub><sup>−</sup> and NH<sub>4</sub><sup>+</sup> had the highest contributions to PM<sub>2.5</sub> formation, indicating that secondary formation mechanisms play a dominant role. Furthermore, at high concentrations, the contributions of NO<sub>3</sub><sup>−</sup>, NH<sub>4</sub><sup>+</sup>, and SO<sub>4</sub><sup>2−</sup> were the highest, and the contributions of metal components and PM<sub>10</sub> were higher than the average. In particular, it was observed that NH<sub>4</sub><sup>+</sup> and K showed a positive correlation with PM<sub>2.5</sub> formation. Future research will focus on refining the model through clustering-based approaches and other enhancements, aiming to deepen the understanding of PM<sub>2.5</sub> formation patterns and provide meaningful insights for policymaking.</div></div>\",\"PeriodicalId\":8604,\"journal\":{\"name\":\"Atmospheric Pollution Research\",\"volume\":\"16 11\",\"pages\":\"Article 102672\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Atmospheric Pollution Research\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1309104225002740\",\"RegionNum\":3,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Pollution Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1309104225002740","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
Development and interpretation of PM2.5 estimation model for the Seoul Metropolitan Area using machine learning and explainable AI
PM2.5 is emitted and formed in the atmosphere through various factors, posing significant health risks to humans. Therefore, accurately estimating PM2.5 concentrations and analyzing the contributions of individual factors are crucial. A Deep Neural Network (DNN) model was developed for PM2.5 estimation in the Seoul Metropolitan Area in South Korea, while some machine learning models—Random Forest and Extreme Gradient Boosting—were also built for performance comparison. Among these, the DNN model demonstrated the best performance, with an R2 of 0.95, MSE of 12.14, and MAE of 2.6. Based on this, Explainable Artificial Intelligence (XAI) techniques, including Vanilla Gradient and Shapley Additive Explanation (SHAP), were applied to interpret the PM2.5 estimation model and analyze the contribution of each factor. The contribution analysis for the Seoul Metropolitan Area revealed that NO3− and NH4+ had the highest contributions to PM2.5 formation, indicating that secondary formation mechanisms play a dominant role. Furthermore, at high concentrations, the contributions of NO3−, NH4+, and SO42− were the highest, and the contributions of metal components and PM10 were higher than the average. In particular, it was observed that NH4+ and K showed a positive correlation with PM2.5 formation. Future research will focus on refining the model through clustering-based approaches and other enhancements, aiming to deepen the understanding of PM2.5 formation patterns and provide meaningful insights for policymaking.
期刊介绍:
Atmospheric Pollution Research (APR) is an international journal designed for the publication of articles on air pollution. Papers should present novel experimental results, theory and modeling of air pollution on local, regional, or global scales. Areas covered are research on inorganic, organic, and persistent organic air pollutants, air quality monitoring, air quality management, atmospheric dispersion and transport, air-surface (soil, water, and vegetation) exchange of pollutants, dry and wet deposition, indoor air quality, exposure assessment, health effects, satellite measurements, natural emissions, atmospheric chemistry, greenhouse gases, and effects on climate change.