{"title":"Neighborhood ozone estimation in Busan, South Korea: A comparative study of proximity-based ensemble clustering and machine-learning models","authors":"Ahmad Daudsyah Imami , Jurng-Jae Yee","doi":"10.1016/j.apr.2025.102601","DOIUrl":null,"url":null,"abstract":"<div><div>Busan is one of the southernmost metropolitan areas with the highest ozone pollution levels influenced by urban development and specific coastal meteorological conditions. The study present Cluster-Based Ensemble Regression (CBER), which consist of two-stage workflow which are benchmarking stage (Pre-CBER) and operational stage (CBER). During Pre-CBER, six machine-learning algorithms were compared, and multiple unsupervised-learning techniques were evaluated in parallel to cluster stations with similar meteorological characteristics and ozone patterns. Hyper-parameter-tuned XGBoost emerged as the most accurate regressor (RMSE = 3.69 ppb, R<sup>2</sup> = 0.95). Nine clustering scenarios were assessed with the Silhouette score, ultimately retaining solutions based on both centroid based and density based clustering. In the CBER phase, XGBoost models were trained within each shortlisted cluster scenario and validated through leave-one-station-out tests. KNN based Meteorological Regionalization preserved fine-scale variability, sustaining R<sup>2</sup> > 0.90 and RMSE <7 ppb in 12 of 14 clusters, while still achieving R<sup>2</sup> averagely 0.75–0.80 in the emissions-intensive port and mountainous northeast. SHAP interpretation ranked nitrogen dioxide, temperature, solar radiation, and diurnal timing as dominant predictors. The computationally light, transparent pipeline thus converts sparse monitoring into hourly subdistrict ozone maps, providing actionable decision for Busan and other coastal cities with limited AQMS networks.</div></div>","PeriodicalId":8604,"journal":{"name":"Atmospheric Pollution Research","volume":"16 10","pages":"Article 102601"},"PeriodicalIF":3.9000,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Pollution Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S130910422500203X","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Busan is one of the southernmost metropolitan areas with the highest ozone pollution levels influenced by urban development and specific coastal meteorological conditions. The study present Cluster-Based Ensemble Regression (CBER), which consist of two-stage workflow which are benchmarking stage (Pre-CBER) and operational stage (CBER). During Pre-CBER, six machine-learning algorithms were compared, and multiple unsupervised-learning techniques were evaluated in parallel to cluster stations with similar meteorological characteristics and ozone patterns. Hyper-parameter-tuned XGBoost emerged as the most accurate regressor (RMSE = 3.69 ppb, R2 = 0.95). Nine clustering scenarios were assessed with the Silhouette score, ultimately retaining solutions based on both centroid based and density based clustering. In the CBER phase, XGBoost models were trained within each shortlisted cluster scenario and validated through leave-one-station-out tests. KNN based Meteorological Regionalization preserved fine-scale variability, sustaining R2 > 0.90 and RMSE <7 ppb in 12 of 14 clusters, while still achieving R2 averagely 0.75–0.80 in the emissions-intensive port and mountainous northeast. SHAP interpretation ranked nitrogen dioxide, temperature, solar radiation, and diurnal timing as dominant predictors. The computationally light, transparent pipeline thus converts sparse monitoring into hourly subdistrict ozone maps, providing actionable decision for Busan and other coastal cities with limited AQMS networks.
期刊介绍:
Atmospheric Pollution Research (APR) is an international journal designed for the publication of articles on air pollution. Papers should present novel experimental results, theory and modeling of air pollution on local, regional, or global scales. Areas covered are research on inorganic, organic, and persistent organic air pollutants, air quality monitoring, air quality management, atmospheric dispersion and transport, air-surface (soil, water, and vegetation) exchange of pollutants, dry and wet deposition, indoor air quality, exposure assessment, health effects, satellite measurements, natural emissions, atmospheric chemistry, greenhouse gases, and effects on climate change.