Seongjun Park, Kwang-Joo Moon, Hyo-Jin Eom, Seung-Muk Yi, Youngkwon Kim, Moonkyung Kim, Donghyun Rim, Young Su Lee
{"title":"基于机器学习的首尔都市圈环境CO2和CH4浓度高时间分辨率预测","authors":"Seongjun Park, Kwang-Joo Moon, Hyo-Jin Eom, Seung-Muk Yi, Youngkwon Kim, Moonkyung Kim, Donghyun Rim, Young Su Lee","doi":"10.1016/j.envpol.2025.126362","DOIUrl":null,"url":null,"abstract":"Machine learning has the potential to support the growing need for high-resolution greenhouse gas monitoring in urban and industrial environments, where deploying extensive sensor networks is often limited by cost and operational challenges. This study presents a novel approach for estimating greenhouse gas (GHG) concentrations using routinely collected air quality and meteorological data from existing monitoring stations. Focusing on the Seoul metropolitan area, we developed and evaluated three machine learning models - Random Forest, Long Short-Term Memory (LSTM), and an ensemble learning approach - to predict CO<sub>2</sub> and CH<sub>4</sub> concentrations without relying on additional GHG monitoring equipment. Among these, the ensemble learning model outperformed the individual models, consistently achieving lower error metrics, even in data-limited scenarios. Feature importance analysis identifies NO<sub>2</sub>, CO, O<sub>3</sub>, and temperature as key predictors of CO<sub>2</sub> and CH<sub>4</sub> level variations, highlighting the influence of combustion-related pollutants and photochemical processes. Cross-validation results confirm the model’s out-of-sample capabilities; however, local factors, such as traffic density, industrial activities, and meteorology, can affect performance. Consequently, model retraining or transfer learning may be required when applying the model to new locations with comparable emission profiles or atmospheric conditions. These findings emphasize the importance of localized context in model application while also demonstrating the broader applicability of the approach. By utilizing data already available through urban monitoring networks, this study offers a scalable and cost-effective strategy to support high-resolution GHG monitoring and inform targeted climate policies in complex urban-industrial regions.","PeriodicalId":311,"journal":{"name":"Environmental Pollution","volume":"13 1","pages":""},"PeriodicalIF":7.6000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning-based prediction of ambient CO2 and CH4 concentrations with high temporal resolution in Seoul metropolitan area\",\"authors\":\"Seongjun Park, Kwang-Joo Moon, Hyo-Jin Eom, Seung-Muk Yi, Youngkwon Kim, Moonkyung Kim, Donghyun Rim, Young Su Lee\",\"doi\":\"10.1016/j.envpol.2025.126362\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning has the potential to support the growing need for high-resolution greenhouse gas monitoring in urban and industrial environments, where deploying extensive sensor networks is often limited by cost and operational challenges. This study presents a novel approach for estimating greenhouse gas (GHG) concentrations using routinely collected air quality and meteorological data from existing monitoring stations. Focusing on the Seoul metropolitan area, we developed and evaluated three machine learning models - Random Forest, Long Short-Term Memory (LSTM), and an ensemble learning approach - to predict CO<sub>2</sub> and CH<sub>4</sub> concentrations without relying on additional GHG monitoring equipment. Among these, the ensemble learning model outperformed the individual models, consistently achieving lower error metrics, even in data-limited scenarios. Feature importance analysis identifies NO<sub>2</sub>, CO, O<sub>3</sub>, and temperature as key predictors of CO<sub>2</sub> and CH<sub>4</sub> level variations, highlighting the influence of combustion-related pollutants and photochemical processes. Cross-validation results confirm the model’s out-of-sample capabilities; however, local factors, such as traffic density, industrial activities, and meteorology, can affect performance. Consequently, model retraining or transfer learning may be required when applying the model to new locations with comparable emission profiles or atmospheric conditions. These findings emphasize the importance of localized context in model application while also demonstrating the broader applicability of the approach. By utilizing data already available through urban monitoring networks, this study offers a scalable and cost-effective strategy to support high-resolution GHG monitoring and inform targeted climate policies in complex urban-industrial regions.\",\"PeriodicalId\":311,\"journal\":{\"name\":\"Environmental Pollution\",\"volume\":\"13 1\",\"pages\":\"\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-05-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmental Pollution\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://doi.org/10.1016/j.envpol.2025.126362\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Pollution","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1016/j.envpol.2025.126362","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
Machine learning-based prediction of ambient CO2 and CH4 concentrations with high temporal resolution in Seoul metropolitan area
Machine learning has the potential to support the growing need for high-resolution greenhouse gas monitoring in urban and industrial environments, where deploying extensive sensor networks is often limited by cost and operational challenges. This study presents a novel approach for estimating greenhouse gas (GHG) concentrations using routinely collected air quality and meteorological data from existing monitoring stations. Focusing on the Seoul metropolitan area, we developed and evaluated three machine learning models - Random Forest, Long Short-Term Memory (LSTM), and an ensemble learning approach - to predict CO2 and CH4 concentrations without relying on additional GHG monitoring equipment. Among these, the ensemble learning model outperformed the individual models, consistently achieving lower error metrics, even in data-limited scenarios. Feature importance analysis identifies NO2, CO, O3, and temperature as key predictors of CO2 and CH4 level variations, highlighting the influence of combustion-related pollutants and photochemical processes. Cross-validation results confirm the model’s out-of-sample capabilities; however, local factors, such as traffic density, industrial activities, and meteorology, can affect performance. Consequently, model retraining or transfer learning may be required when applying the model to new locations with comparable emission profiles or atmospheric conditions. These findings emphasize the importance of localized context in model application while also demonstrating the broader applicability of the approach. By utilizing data already available through urban monitoring networks, this study offers a scalable and cost-effective strategy to support high-resolution GHG monitoring and inform targeted climate policies in complex urban-industrial regions.
期刊介绍:
Environmental Pollution is an international peer-reviewed journal that publishes high-quality research papers and review articles covering all aspects of environmental pollution and its impacts on ecosystems and human health.
Subject areas include, but are not limited to:
• Sources and occurrences of pollutants that are clearly defined and measured in environmental compartments, food and food-related items, and human bodies;
• Interlinks between contaminant exposure and biological, ecological, and human health effects, including those of climate change;
• Contaminants of emerging concerns (including but not limited to antibiotic resistant microorganisms or genes, microplastics/nanoplastics, electronic wastes, light, and noise) and/or their biological, ecological, or human health effects;
• Laboratory and field studies on the remediation/mitigation of environmental pollution via new techniques and with clear links to biological, ecological, or human health effects;
• Modeling of pollution processes, patterns, or trends that is of clear environmental and/or human health interest;
• New techniques that measure and examine environmental occurrences, transport, behavior, and effects of pollutants within the environment or the laboratory, provided that they can be clearly used to address problems within regional or global environmental compartments.