{"title":"Optimal feature selection for improved ML based reconstruction of Global Terrestrial Water Storage Anomalies","authors":"Nehar Mandal, Prabal Das, Kironmala Chanda","doi":"10.5194/essd-2024-109","DOIUrl":"https://doi.org/10.5194/essd-2024-109","url":null,"abstract":"<strong>Abstract.</strong> Understanding long-term Terrestrial water storage (TWS) variations is vital for investigating hydrological extreme events, managing water resources, and assessing climate change impacts. However, the limited data duration from the Gravity Recovery and Climate Experiment (GRACE) and its follow-on missions (GRACE-FO) poses challenges for comprehensive long-term analysis. In this study, we reconstruct TWS anomalies (TWSA) for the period Jan 1960 to Dec 2022 thereby filling data gaps between GRACE and GRACE-FO missions as well as generating a complete dataset for the pre-GRACE era. The workflow involves identifying optimal predictors from land surface model (LSM) outputs, meteorological variables, and climatic indices using a novel Bayesian Network (BN) technique for grid-based TWSA simulations. Climate indices, like the Oceanic Niño Index and Dipole Mode Index, are selected as optimal predictors for a large number of grids globally, along with TWSA from LSM outputs. The most effective machine learning (ML) algorithms among Convolutional Neural Network (CNN), Support Vector Regression (SVR), Extra Trees Regressor (ETR), and Stacking Ensemble Regression (SER) models are evaluated at each grid location to achieve optimal reproducibility. Globally, ETR performs best for most of the grids which is also noticed at the river-basin scale, particularly for the Ganga-Brahmaputra-Meghana, Godavari, Krishna, Limpopo, and Nile river basins. The simulated TWSA (BNML_TWSA) outperformed the TWSA from LSM outputs when evaluated against GRACE datasets. Improvements are particularly noted in the river basins such as Godavari, Krishna, Danube, Amazon, etc., with median values of the correlation coefficient, Nash-Sutcliffe efficiency, and RMSE for all grids in Godavari, India, being 0.927, 0.839, and 63.7 mm respectively. A comparison with TWSA reconstructed in recent studies indicates that the proposed BNML_TWSA outperforms them globally as well as for all the 11 major river basins examined. The presented dataset is published at https://doi.org/10.6084/m9.figshare.25376695 (Mandal et al., 2024) and updates will be published when needed.","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"1 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Brazilian Atmospheric Inventories – BRAIN: a comprehensive database of air quality in Brazil","authors":"Leonardo Hoinaski, Robson Will, Camilo Bastos Ribeiro","doi":"10.5194/essd-16-2385-2024","DOIUrl":"https://doi.org/10.5194/essd-16-2385-2024","url":null,"abstract":"Abstract. Developing air quality management systems to control the impacts of air pollution requires reliable data. However, current initiatives do not provide datasets with large spatial and temporal resolutions for developing air pollution policies in Brazil. Here, we introduce the Brazilian Atmospheric Inventories (BRAIN), the first comprehensive database of air quality and its drivers in Brazil. BRAIN encompasses hourly datasets of meteorology, emissions, and air quality. The emissions dataset includes vehicular emissions derived from the Brazilian Vehicular Emissions Inventory Software (BRAVES), industrial emissions produced with local data from the Brazilian environmental agencies, biomass burning emissions from FINN – Fire INventory from the National Center for Atmospheric Research (NCAR), and biogenic emissions from the Model of Emissions of Gases and Aerosols from Nature (MEGAN) (https://doi.org/10.57760/sciencedb.09858, Hoinaski et al., 2023a; https://doi.org/10.57760/sciencedb.09886, Hoinaski et al., 2023b). The meteorology dataset has been derived from the Weather Research and Forecasting Model (WRF) (https://doi.org/10.57760/sciencedb.09857, Hoinaski and Will, 2023a; https://doi.org/10.57760/sciencedb.09885, Hoinaski and Will, 2023c). The air quality dataset contains the surface concentration of 216 air pollutants produced from coupling meteorological and emissions datasets with the Community Multiscale Air Quality Modeling System (CMAQ) (https://doi.org/10.57760/sciencedb.09859, Hoinaski and Will, 2023b; https://doi.org/10.57760/sciencedb.09884, Hoinaski and Will, 2023d). We provide gridded data in two domains, one covering the Brazilian territory with 20×20 km spatial resolution and another covering southern Brazil with 4×4 km spatial resolution. This paper describes how the datasets were produced, their limitations, and their spatiotemporal features. To evaluate the quality of the database, we compare the air quality dataset with 244 air quality monitoring stations, providing the model's performance for each pollutant measured by the monitoring stations. We present a sample of the spatial variability of emissions, meteorology, and air quality in Brazil from 2019, revealing the hotspots of emissions and air pollution issues. By making BRAIN publicly available, we aim to provide the required data for developing air quality policies on municipal and state scales, especially for under-developed and data-scarce municipalities. We also envision that BRAIN has the potential to create new insights into and opportunities for air pollution research in Brazil.","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"48 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140949419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Songchao Chen, Zhongxing Chen, Xianglin Zhang, Zhongkui Luo, Calogero Schillaci, Dominique Arrouays, Anne Christine Richer-de-Forges, Zhou Shi
{"title":"European topsoil bulk density and organic carbon stock database (0–20 cm) using machine-learning-based pedotransfer functions","authors":"Songchao Chen, Zhongxing Chen, Xianglin Zhang, Zhongkui Luo, Calogero Schillaci, Dominique Arrouays, Anne Christine Richer-de-Forges, Zhou Shi","doi":"10.5194/essd-16-2367-2024","DOIUrl":"https://doi.org/10.5194/essd-16-2367-2024","url":null,"abstract":"Abstract. Soil bulk density (BD) serves as a fundamental indicator of soil health and quality, exerting a significant influence on critical factors such as plant growth, nutrient availability, and water retention. Due to its limited availability in soil databases, the application of pedotransfer functions (PTFs) has emerged as a potent tool for predicting BD using other easily measurable soil properties, while the impact of these PTFs' performance on soil organic carbon (SOC) stock calculation has been rarely explored. In this study, we proposed an innovative local modeling approach for predicting BD of fine earth (BDfine) across Europe using the recently released BDfine data from the LUCAS Soil (Land Use and Coverage Area Frame Survey Soil) 2018 (0–20 cm) and relevant predictors. Our approach involved a combination of neighbor sample search, forward recursive feature selection (FRFS), and random forest (RF) models (local-RFFRFS). The results showed that local-RFFRFS had a good performance in predicting BDfine (R2 of 0.58, root mean square error (RMSE) of 0.19 g cm−3, relative error (RE) of 16.27 %), surpassing the earlier-published PTFs (R2 of 0.40–0.45, RMSE of 0.22 g cm−3, RE of 19.11 %–21.18 %) and global PTFs using RF models with and without FRFS (R2 of 0.56–0.57, RMSE of 0.19 g cm−3, RE of 16.47 %–16.74 %). Interestingly, we found that the best earlier-published PTF (R2 = 0.84, RMSE = 1.39 kg m−2, RE of 17.57 %) performed close to the local-RFFRFS (R2 = 0.85, RMSE = 1.32 kg m−2, RE of 15.01 %) in SOC stock calculation using BDfine predictions. However, the local-RFFRFS still performed better (ΔR2 > 0.2) for soil samples with low SOC stocks (< 3 kg m−2). Therefore, we suggest that the local-RFFRFS is a promising method for BDfine prediction, while earlier-published PTFs would be more efficient when BDfine is subsequently utilized for calculating SOC stock. Finally, we produced two topsoil BDfine and SOC stock datasets (18 945 and 15 389 soil samples) at 0–20 cm for LUCAS Soil 2018 using the best earlier-published PTF and local-RFFRFS, respectively. This dataset is archived on the Zenodo platform at https://doi.org/10.5281/zenodo.10211884 (S. Chen et al., 2023). The outcomes of this study present a meaningful advancement in enhancing the predictive accuracy of BDfine, and the resultant BDfine and SOC stock datasets for topsoil across the Europe enable more precise soil hydrological and biological modeling.","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"24 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140949353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nele Reyniers, Qianyu Zha, Nans Addor, Timothy J. Osborn, Nicole Forstenhäusler, Yi He
{"title":"Two sets of bias-corrected regional UK Climate Projections 2018 (UKCP18) of temperature, precipitation and potential evapotranspiration for Great Britain","authors":"Nele Reyniers, Qianyu Zha, Nans Addor, Timothy J. Osborn, Nicole Forstenhäusler, Yi He","doi":"10.5194/essd-2024-132","DOIUrl":"https://doi.org/10.5194/essd-2024-132","url":null,"abstract":"<strong>Abstract.</strong> The United Kingdom Climate Projections 2018 (UKCP18) regional climate model (RCM) 12 km regional perturbed physics ensemble (UKCP18-RCM-PPE) is one of the three strands of the latest set of UK national climate projections produced by the UK Met Office. It has been widely adopted in climate impact assessment. In this study, we report biases in the raw UKCP18-RCM simulations that are significant and are likely to deteriorate impact assessments if they are not adjusted. Two methods were used to bias-correct UKCP18-RCM: non-parametric quantile mapping using empirical quantiles and a variant developed for the third phase of the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP) designed to preserve the climate change signal. Specifically, daily temperature and precipitation simulations for 1981 to 2080 were adjusted for the 12 ensemble members. Potential evapotranspiration was also estimated over the same period using the Penman-Monteith formulation and then bias-corrected using the latter method. Both methods successfully corrected biases in a range of daily temperature, precipitation and potential evapotranspiration metrics, and reduced biases in multi-day precipitation metrics to a lesser degree. An exploratory analysis of the projected future changes confirms the expectation of wetter, warmer winters and hotter, drier summers, and shows uneven changes in different parts of the distributions of both temperature and precipitation. Both bias-correction methods preserved the climate change signal almost equally well, as well as the spread among the projected changes. The change factor method was used as a benchmark for precipitation, and we show that it fails to capture changes in a range of variables, making it inadequate for most impact assessments. By comparing the differences between the two bias-correction methods and within the 12 ensemble members, we show that the uncertainty in future precipitation and temperature changes stemming from the climate model parameterisation far outweighs the uncertainty introduced by selecting one of these two bias-correction methods. We conclude by providing guidance on the use of the bias-corrected data sets. The data sets bias adjusted with ISIMIP3BA are publicly available in the following repositories: https://doi.org/10.5281/zenodo.6337381 for precipitation and temperature (Reyniers et al., 2022a) and https://doi.org/10.5281/zenodo.6320707 for potential evapotranspiration (Reyniers et al., 2022b) . The datasets bias-corrected using the quantile mapping method are available at https://doi.org/10.5281/zenodo.8223024 (Zha et al., 2023) .","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"145 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140949414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pierre-Antoine Versini, Leydy Alejandra Castellanos-Diaz, David Ramier, Ioulia Tchiguirinskaia
{"title":"Evapotranspiration evaluation using three different protocols on a large green roof in the greater Paris area","authors":"Pierre-Antoine Versini, Leydy Alejandra Castellanos-Diaz, David Ramier, Ioulia Tchiguirinskaia","doi":"10.5194/essd-16-2351-2024","DOIUrl":"https://doi.org/10.5194/essd-16-2351-2024","url":null,"abstract":"Abstract. Nature-based solutions have appeared as relevant solutions to mitigate urban heat islands. To improve our knowledge of the assessment of this ecosystem service and the related physical processes (evapotranspiration), monitoring campaigns are required. This was the objective of several experiments carried out on the Blue Green Wave, a large green roof located in Champs-sur-Marne (France). Three different protocols were implemented and tested to assess the evapotranspiration flux at different scales: the first one was based on the surface energy balance (large scale); the second one was carried out using an evapotranspiration chamber (small scale); and the third one was based on the water balance evaluated during dry periods (point scale). In addition to these evapotranspiration estimates, several hydrometeorological variables (especially temperature) were measured. Related data and Python programs providing preliminary elements of the analysis and graphical representation have been made available. They illustrate the space–time variability in the studied processes regarding their observation scale. The dataset is available at https://doi.org/10.5281/zenodo.8064053 (Versini et al., 2023).","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"55 18 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yichen Jiang, Su Shi, Xinyue Li, Chang Xu, Haidong Kan, Bo Hu, Xia Meng
{"title":"A 10 km daily-level ultraviolet radiation predicting dataset based on machine learning models in China from 2005 to 2020","authors":"Yichen Jiang, Su Shi, Xinyue Li, Chang Xu, Haidong Kan, Bo Hu, Xia Meng","doi":"10.5194/essd-2024-111","DOIUrl":"https://doi.org/10.5194/essd-2024-111","url":null,"abstract":"<strong>Abstract.</strong> Ultraviolet (UV) radiation is closely related to health, but limited measurements hindered further investigation of its health effects in China. Machine learning algorithm has been widely used in predicting environmental factors with high accuracy, but limited studies have done for UV radiation. This study aimed to develop UV radiation prediction model based on random forest method, and predict UV radiation at daily level and 10 km resolution in mainland China in 2005–2020. A random forest model was employed to predict UV radiation by integrating ground UV radiation measurements from monitoring stations and multiple predictors, such as UV radiation data from satellite. Missing data of satellite-based UV radiation was filled by three-day moving average method. The model's performance was evaluated through multiple cross-validation (CV) methods. The overall R<sup>2</sup> (root mean square error, RMSE) between measured and predicted UV radiation from model development and model 10-fold CV was 0.97 (15.64 W m<sup>-2</sup>) and 0.83 (37.44 W m<sup>-2</sup>) at daily level, respectively. The model with OMI EDD performed higher predicting accuracy than the one without it. Based on predictions of UV radiation at daily level and 10 km spatial resolution and nearly 100 % spatiotemporal coverage, we found UV radiation increased by 4.20 % while PM<sub>2.5</sub> levels decreased by 48.51 % and O<sub>3</sub> levels rose by 22.70 % in 2013–2020, suggesting a potential correlation among these environmental factors. Uneven spatial distribution of UV radiation was found to be associated with factors such as latitude, elevation, meteorological factors and seasons. The eastern areas of China posed higher risk with both high population density and UV radiation intensity. Based on machine learning algorithm, this study generated a gridded dataset characterized by relatively high precision and extensive spatiotemporal coverage of UV radiation, which demonstrates the spatiotemporal variability of UV radiation levels in China and can facilitate health-related research in the future. This dataset is currently freely available at https://doi.org/10.5281/zenodo.10884591 (Jiang et al., 2024).","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"32 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adrià Descals, David L. A. Gaveau, Serge Wich, Zoltan Szantoi, Erik Meijaard
{"title":"Global mapping of oil palm planting year from 1990 to 2021","authors":"Adrià Descals, David L. A. Gaveau, Serge Wich, Zoltan Szantoi, Erik Meijaard","doi":"10.5194/essd-2024-157","DOIUrl":"https://doi.org/10.5194/essd-2024-157","url":null,"abstract":"<strong>Abstract.</strong> Oil palm is a controversial crop, primarily because it is associated with negative environmental impacts such as tropical deforestation. Mapping the crop and its characteristics, such as age, is crucial for informing public and policy discussions regarding these impacts. Oil palm has received substantial mapping efforts, but up-to-date accurate oil palm maps for both extent and age are essential for monitoring impacts and informing concomitant debate. Here, we present a 10-meter resolution global map of industrial and smallholder oil palm, developed using Sentinel-1 data for the years 2016–2021 and a deep learning model based on convolutional neural networks. In addition, we used Landsat-5, -7, and -8 to estimate the planting year from 1990 to 2021 at a 30-meter spatial resolution. The planting year indicates the year of establishment for an oil palm plantation as of 2021, either newly planted or replanted oil palm in an existing plantation. We validated the oil palm extent layer using 17,812 randomly distributed reference points. The accuracy of the planting year layer was assessed using field data collected from 5,831 industrial parcels and 1,012 smallholder plantations distributed throughout the oil palm growing area. We found oil palm plantations covering a total mapped area of 23.98 Mha, and our area estimates are 16.66 ± 0.25 Mha of industrial and 7.59 ± 0.29 Mha of smallholder oil palm worldwide. The producers’ and users’ accuracy is 91.9 ± 3.4 % and 91.8 ± 1.0 % for industrial plantations, and 72.7 ± 1.3 % and 75.7 ± 2.5 % for smallholders, which improves upon a previous global oil palm dataset, particularly in terms of omission of oil palm. The overall mean error between estimated planting year and field data was -0.24 years and the root-mean-square error was 2.65 years, but the agreement was lower for smallholders. Mapping the extent and planting year of smallholder plantations remains challenging, particularly for wild and sparsely planted oil palm, and future mapping efforts should focus on these specific types of plantations. The average oil palm plantation age was 14.1 years, and the area of oil palm over 20 years was 6.28 Mha. Given that oil palm plantations are typically replanted after 25 years, our findings indicate that this area will require replanting within the coming decade, starting from 2021. Our dataset provides valuable input for optimal land use planning to meet the growing global demand for vegetable oils. The global oil palm extent layer for the year 2021 and the planting year layer from 1990 to 2021 can be found at https://doi.org/10.5281/zenodo.11034131 (Descals, 2024).","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"20 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guorong Zhong, Xuegang Li, Jinming Song, Baoxiao Qu, Fan Wang, Yanjun Wang, Bin Zhang, Lijing Cheng, Jun Ma, Huamao Yuan, Liqin Duan, Ning Li, Qidong Wang, Jianwei Xing, Jiajia Dai
{"title":"A global monthly field of seawater pH over 3 decades: a machine learning approach","authors":"Guorong Zhong, Xuegang Li, Jinming Song, Baoxiao Qu, Fan Wang, Yanjun Wang, Bin Zhang, Lijing Cheng, Jun Ma, Huamao Yuan, Liqin Duan, Ning Li, Qidong Wang, Jianwei Xing, Jiajia Dai","doi":"10.5194/essd-2024-151","DOIUrl":"https://doi.org/10.5194/essd-2024-151","url":null,"abstract":"<strong>Abstract.</strong> The continuous uptake of anthropogenic CO<sub>2</sub> by the ocean leads to ocean acidification, which is an ongoing threat to the marine ecosystem. The ocean acidification rate was globally documented in the surface ocean but limited below the surface. Here, we present a monthly four-dimensional 1°×1° gridded product of global seawater pH, derived from a machine learning algorithm trained on pH observations at total scale and in-situ temperature from the Global Ocean Data Analysis Project (GLODAP). The constructed pH product covers the years 1992–2020 and depths from the surface to 2 km on 41 levels. Three types of machine learning algorithms were used in the pH product construction, including self-organizing map neural networks for region dividing, a stepwise algorithm for predictor selection, and feed-forward neural networks (FFNN) for non-linear relationship regression. The performance of the machine learning algorithm was validated using real observations by a cross validation method, where four repeating iterations were carried out with 25 % varied observations for each evaluation and 75 % for training. The constructed pH product is evaluated through comparisons to time series observations and the GLODAP pH climatology. The overall root mean square error between the FFNN constructed pH and the GLODAP measurements is 0.028, ranging from 0.044 in the surface to 0.013 at 2000 m. The pH product is distributed through the data repository of the Marine Science Data Center of the Chinese Academy of Sciences at http://dx.doi.org/10.12157/IOCAS.20230720.001 (Zhong et al., 2023).","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"33 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Retrieving Ground-Level PM2.5 Concentrations in China (2013–2021) with a Numerical Model-Informed Testbed to Mitigate Sample Imbalance-Induced Biases","authors":"Siwei Li, Yu Ding, Jia Xing, Joshua S. Fu","doi":"10.5194/essd-2024-170","DOIUrl":"https://doi.org/10.5194/essd-2024-170","url":null,"abstract":"<strong>Abstract.</strong> Ground-level PM<sub>2.5</sub> data derived from satellites with machine learning are crucial for health and climate assessments, however, uncertainties persist due to the absence of spatially covered observations. To address this, we propose a novel testbed using untraditional numerical simulations to evaluate PM<sub>2.5</sub> estimation across the entire spatial domain. The testbed emulates the general machine-learning approach, by training the model with grids corresponding to ground monitor sites and subsequently testing its predictive accuracy for other locations. Our approach enables comprehensive evaluation of various machine-learning methods’ performance in estimating PM<sub>2.5</sub> across the spatial domain for the first time. Unexpected results are shown in the application in China, with larger PM<sub>2.5 </sub>biases found in densely populated regions with abundant ground observations across all benchmark models, challenging conventional expectations and are not explored in the recent literature. The imbalance in training samples, mostly from urban areas with high emissions, is the main reason, leading to significant overestimation due to the lack of monitors in downwind areas where PM<sub>2.5 </sub>is transported from urban areas with varying vertical profiles. Our proposed testbed also provides an efficient strategy for optimizing model structure or training samples to enhance satellite-retrieval model performance. Integration of spatiotemporal features, especially with CNN-based deep-learning approaches like the ResNet model, successfully mitigates PM<sub>2.5 </sub>overestimation (by 5–30 µg m<sup>-3</sup>) and corresponding exposure (by 3 million people • µg m<sup>-3</sup>) in the downwind area over the past nine years (2013–2021) compared to the traditional approach. Furthermore, the incorporation of 600 strategically positioned ground-measurement sites identified through the testbed is essential to achieve a more balanced distribution of training samples, thereby ensuring precise PM<sub>2.5</sub> estimation and facilitating the assessment of associated impacts in China. In addition to presenting the retrieved surface PM<sub>2.5 </sub>concentrations in China from 2013 to 2021, this study provides a testbed dataset derived from physical modeling simulations which can serve to evaluate the performance of data-driven methodologies, such as machine learning, in estimating spatial PM<sub>2.5</sub> concentrations for the community.","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"20 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Water vapor Raman-lidar observations from multiple sites in the framework of WaLiNeAs","authors":"Frédéric Laly, Patrick Chazette, Julien Totems, Jérémy Lagarrigue, Laurent Forges, Cyrille Flamant","doi":"10.5194/essd-2024-73","DOIUrl":"https://doi.org/10.5194/essd-2024-73","url":null,"abstract":"<strong>Abstract.</strong> During the Water Vapor Lidar Network Assimilation (WaLiNeAs) campaign, 8 lidars specifically designed to measure water vapor mixing ratio (WVMR) profiles were deployed on the western Mediterranean coast. The main objectives were to investigate the water vapor content during case studies of heavy precipitation events in the coastal Western Mediterranean and assess the impact of high spatio-temporal WVMR data on numerical weather prediction forecasts by means of state–of–the–art assimilation techniques. Given the increasing occurrence of extreme events due to climate change, WaLiNeAs is the first program in Europe to provide network–like, simultaneous and continuous water vapor profile measurements. This paper focuses on the WVMR profiling datasets obtained from three of the lidars managed by the French component of the WaLiNeAs team. These lidars were deployed in the towns of Coursan, Grau du Roi and Cannes. This measurement setup enabled monitoring of the water vapor content within the low troposphere along a period of three months over autumn – winter 2022 and four months in summer 2023. The lidars measured the WVMR profiles from the surface up to approximately 6–10 km at night, and 1–2 km during daytime; with a vertical resolution of 100 m and a time sampling between 15 – 30 min, selected to meet the needs of weather forecasting with an uncertainty lower than 0.4 g kg<sup>-1</sup>. The paper presents details about the instruments, the experimental strategy, as well as the datasets given in NETcdf format. The final dataset is divided in two datasets, the first with a time resolution of 15 min, which contains a total of 26 423 WVMR vertical profiles and the second with a time resolution of 30 min to improve the signal to noise ratio and signal altitude range.","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"33 1","pages":""},"PeriodicalIF":11.4,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}