Neighborhood ozone estimation in Busan, South Korea: A comparative study of proximity-based ensemble clustering and machine-learning models

IF 3.9 3区 环境科学与生态学 Q2 ENVIRONMENTAL SCIENCES
Ahmad Daudsyah Imami , Jurng-Jae Yee
{"title":"Neighborhood ozone estimation in Busan, South Korea: A comparative study of proximity-based ensemble clustering and machine-learning models","authors":"Ahmad Daudsyah Imami ,&nbsp;Jurng-Jae Yee","doi":"10.1016/j.apr.2025.102601","DOIUrl":null,"url":null,"abstract":"<div><div>Busan is one of the southernmost metropolitan areas with the highest ozone pollution levels influenced by urban development and specific coastal meteorological conditions. The study present Cluster-Based Ensemble Regression (CBER), which consist of two-stage workflow which are benchmarking stage (Pre-CBER) and operational stage (CBER). During Pre-CBER, six machine-learning algorithms were compared, and multiple unsupervised-learning techniques were evaluated in parallel to cluster stations with similar meteorological characteristics and ozone patterns. Hyper-parameter-tuned XGBoost emerged as the most accurate regressor (RMSE = 3.69 ppb, R<sup>2</sup> = 0.95). Nine clustering scenarios were assessed with the Silhouette score, ultimately retaining solutions based on both centroid based and density based clustering. In the CBER phase, XGBoost models were trained within each shortlisted cluster scenario and validated through leave-one-station-out tests. KNN based Meteorological Regionalization preserved fine-scale variability, sustaining R<sup>2</sup> &gt; 0.90 and RMSE &lt;7 ppb in 12 of 14 clusters, while still achieving R<sup>2</sup> averagely 0.75–0.80 in the emissions-intensive port and mountainous northeast. SHAP interpretation ranked nitrogen dioxide, temperature, solar radiation, and diurnal timing as dominant predictors. The computationally light, transparent pipeline thus converts sparse monitoring into hourly subdistrict ozone maps, providing actionable decision for Busan and other coastal cities with limited AQMS networks.</div></div>","PeriodicalId":8604,"journal":{"name":"Atmospheric Pollution Research","volume":"16 10","pages":"Article 102601"},"PeriodicalIF":3.9000,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Pollution Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S130910422500203X","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Busan is one of the southernmost metropolitan areas with the highest ozone pollution levels influenced by urban development and specific coastal meteorological conditions. The study present Cluster-Based Ensemble Regression (CBER), which consist of two-stage workflow which are benchmarking stage (Pre-CBER) and operational stage (CBER). During Pre-CBER, six machine-learning algorithms were compared, and multiple unsupervised-learning techniques were evaluated in parallel to cluster stations with similar meteorological characteristics and ozone patterns. Hyper-parameter-tuned XGBoost emerged as the most accurate regressor (RMSE = 3.69 ppb, R2 = 0.95). Nine clustering scenarios were assessed with the Silhouette score, ultimately retaining solutions based on both centroid based and density based clustering. In the CBER phase, XGBoost models were trained within each shortlisted cluster scenario and validated through leave-one-station-out tests. KNN based Meteorological Regionalization preserved fine-scale variability, sustaining R2 > 0.90 and RMSE <7 ppb in 12 of 14 clusters, while still achieving R2 averagely 0.75–0.80 in the emissions-intensive port and mountainous northeast. SHAP interpretation ranked nitrogen dioxide, temperature, solar radiation, and diurnal timing as dominant predictors. The computationally light, transparent pipeline thus converts sparse monitoring into hourly subdistrict ozone maps, providing actionable decision for Busan and other coastal cities with limited AQMS networks.

Abstract Image

韩国釜山的邻里臭氧估计:基于邻近的集合聚类和机器学习模型的比较研究
釜山是最南端的大都市地区之一,受城市发展和特定沿海气象条件的影响,臭氧污染水平最高。本文提出了基于聚类的集成回归(CBER)方法,该方法包括基准测试阶段(Pre-CBER)和操作阶段(CBER)两阶段的工作流。在Pre-CBER期间,比较了六种机器学习算法,并在具有相似气象特征和臭氧模式的集群站并行评估了多种无监督学习技术。超参数调优的XGBoost是最准确的回归因子(RMSE = 3.69 ppb, R2 = 0.95)。使用Silhouette评分评估了9种聚类方案,最终保留了基于质心和基于密度的聚类方案。在CBER阶段,XGBoost模型在每个入围的集群场景中进行训练,并通过留一站测试进行验证。基于KNN的气象区划保留了精细尺度变率,维持了R2 >;在14个集群中,有12个集群达到0.90和RMSE <; 7ppb,而在排放密集的港口和东北部山区,R2平均仍达到0.75-0.80。SHAP解释将二氧化氮、温度、太阳辐射和昼夜时间列为主要预测因子。计算上轻,透明的管道因此将稀疏的监测转换为每小时分区臭氧图,为釜山和其他AQMS网络有限的沿海城市提供可操作的决策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Atmospheric Pollution Research
Atmospheric Pollution Research ENVIRONMENTAL SCIENCES-
CiteScore
8.30
自引率
6.70%
发文量
256
审稿时长
36 days
期刊介绍: Atmospheric Pollution Research (APR) is an international journal designed for the publication of articles on air pollution. Papers should present novel experimental results, theory and modeling of air pollution on local, regional, or global scales. Areas covered are research on inorganic, organic, and persistent organic air pollutants, air quality monitoring, air quality management, atmospheric dispersion and transport, air-surface (soil, water, and vegetation) exchange of pollutants, dry and wet deposition, indoor air quality, exposure assessment, health effects, satellite measurements, natural emissions, atmospheric chemistry, greenhouse gases, and effects on climate change.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信