{"title":"利用AOD卫星数据和地面代用物估算马德里(西班牙)地面PM2.5浓度水平的两阶段算法","authors":"J.M. Cordero","doi":"10.1016/j.apr.2025.102678","DOIUrl":null,"url":null,"abstract":"<div><div>Poor air quality in urban areas is an important health risk; therefore, reducing population exposure to pollutants such as PM<sub>2.5</sub> is a major concern. Health assessments regarding this pollutant have typically relied on the measurements from urban networks of Air Quality Monitoring Stations (AQMS) to assess population exposure. The methods used for the spatial interpolation of observation often lacks a solid physical basis. Mesoscale air quality models provide high spatiotemporally resolved ground-level concentrations based on urban features, including the distribution of pollution sources; however, they are subject to significant uncertainty. In this work, a novel methodology to produce 1 km<sup>2</sup> resolution maps of ground-level PM<sub>2.5</sub> concentration for the Municipality of Madrid during 2015 is presented. Toward this end, different data sets including: meteorology, satellite observations of atmospheric optical depth (AOD) from MAIAC, emission data, population, land use, and vegetation land cover have been used. Subsequently, we applied extreme gradient boosting (XGBoost) machine learning algorithms in two steps to first fill gaps in the AOD field and then, estimate ground-level PM<sub>2.5</sub> concentration. The predictions of the so-called 2_step_XGBoost algorithm were compared with observations from the all available ground-level PM<sub>2.5</sub> concentration observations from the AQMS in Madrid obtaining a determination coefficient (r<sup>2</sup>) of 0.96, a RMSE of 1.5 μg/m<sup>3</sup>, and negligible bias. Additionally, we used a 10-fold cross validation to confirm the robustness of the algorithm and the independency of the dataset used for training (r<sup>2</sup> of 0.94 ± 0.01, RMSE of 0.40 ± 0.04 and MAE of 0.22 ± 0.02. These results highlight the reliability of this approach for future urban health analysis. In addition, we performed a Feature Importance (FI) analysis that revealed that 2_step_XGBoost identified the planetary boundary layer height (PBLH) as the most influential variable while AOD was found to have relatively low explanatory power, a result that may be contrasted in other case studies.</div></div>","PeriodicalId":8604,"journal":{"name":"Atmospheric Pollution Research","volume":"16 12","pages":"Article 102678"},"PeriodicalIF":3.5000,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A two-stage algorithm to estimate ground-level PM2.5 concentration levels in Madrid (Spain) from AOD satellite data and surface proxies\",\"authors\":\"J.M. Cordero\",\"doi\":\"10.1016/j.apr.2025.102678\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Poor air quality in urban areas is an important health risk; therefore, reducing population exposure to pollutants such as PM<sub>2.5</sub> is a major concern. Health assessments regarding this pollutant have typically relied on the measurements from urban networks of Air Quality Monitoring Stations (AQMS) to assess population exposure. The methods used for the spatial interpolation of observation often lacks a solid physical basis. Mesoscale air quality models provide high spatiotemporally resolved ground-level concentrations based on urban features, including the distribution of pollution sources; however, they are subject to significant uncertainty. In this work, a novel methodology to produce 1 km<sup>2</sup> resolution maps of ground-level PM<sub>2.5</sub> concentration for the Municipality of Madrid during 2015 is presented. Toward this end, different data sets including: meteorology, satellite observations of atmospheric optical depth (AOD) from MAIAC, emission data, population, land use, and vegetation land cover have been used. Subsequently, we applied extreme gradient boosting (XGBoost) machine learning algorithms in two steps to first fill gaps in the AOD field and then, estimate ground-level PM<sub>2.5</sub> concentration. The predictions of the so-called 2_step_XGBoost algorithm were compared with observations from the all available ground-level PM<sub>2.5</sub> concentration observations from the AQMS in Madrid obtaining a determination coefficient (r<sup>2</sup>) of 0.96, a RMSE of 1.5 μg/m<sup>3</sup>, and negligible bias. Additionally, we used a 10-fold cross validation to confirm the robustness of the algorithm and the independency of the dataset used for training (r<sup>2</sup> of 0.94 ± 0.01, RMSE of 0.40 ± 0.04 and MAE of 0.22 ± 0.02. These results highlight the reliability of this approach for future urban health analysis. In addition, we performed a Feature Importance (FI) analysis that revealed that 2_step_XGBoost identified the planetary boundary layer height (PBLH) as the most influential variable while AOD was found to have relatively low explanatory power, a result that may be contrasted in other case studies.</div></div>\",\"PeriodicalId\":8604,\"journal\":{\"name\":\"Atmospheric Pollution Research\",\"volume\":\"16 12\",\"pages\":\"Article 102678\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Atmospheric Pollution Research\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1309104225002806\",\"RegionNum\":3,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Pollution Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1309104225002806","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
A two-stage algorithm to estimate ground-level PM2.5 concentration levels in Madrid (Spain) from AOD satellite data and surface proxies
Poor air quality in urban areas is an important health risk; therefore, reducing population exposure to pollutants such as PM2.5 is a major concern. Health assessments regarding this pollutant have typically relied on the measurements from urban networks of Air Quality Monitoring Stations (AQMS) to assess population exposure. The methods used for the spatial interpolation of observation often lacks a solid physical basis. Mesoscale air quality models provide high spatiotemporally resolved ground-level concentrations based on urban features, including the distribution of pollution sources; however, they are subject to significant uncertainty. In this work, a novel methodology to produce 1 km2 resolution maps of ground-level PM2.5 concentration for the Municipality of Madrid during 2015 is presented. Toward this end, different data sets including: meteorology, satellite observations of atmospheric optical depth (AOD) from MAIAC, emission data, population, land use, and vegetation land cover have been used. Subsequently, we applied extreme gradient boosting (XGBoost) machine learning algorithms in two steps to first fill gaps in the AOD field and then, estimate ground-level PM2.5 concentration. The predictions of the so-called 2_step_XGBoost algorithm were compared with observations from the all available ground-level PM2.5 concentration observations from the AQMS in Madrid obtaining a determination coefficient (r2) of 0.96, a RMSE of 1.5 μg/m3, and negligible bias. Additionally, we used a 10-fold cross validation to confirm the robustness of the algorithm and the independency of the dataset used for training (r2 of 0.94 ± 0.01, RMSE of 0.40 ± 0.04 and MAE of 0.22 ± 0.02. These results highlight the reliability of this approach for future urban health analysis. In addition, we performed a Feature Importance (FI) analysis that revealed that 2_step_XGBoost identified the planetary boundary layer height (PBLH) as the most influential variable while AOD was found to have relatively low explanatory power, a result that may be contrasted in other case studies.
期刊介绍:
Atmospheric Pollution Research (APR) is an international journal designed for the publication of articles on air pollution. Papers should present novel experimental results, theory and modeling of air pollution on local, regional, or global scales. Areas covered are research on inorganic, organic, and persistent organic air pollutants, air quality monitoring, air quality management, atmospheric dispersion and transport, air-surface (soil, water, and vegetation) exchange of pollutants, dry and wet deposition, indoor air quality, exposure assessment, health effects, satellite measurements, natural emissions, atmospheric chemistry, greenhouse gases, and effects on climate change.