{"title":"LGHAP v2: a global gap-free aerosol optical depth and PM2.5 concentration dataset since 2000 derived via big Earth data analytics","authors":"Kaixu Bai, Ke Li, Liuqing Shao, Xinran Li, Chaoshun Liu, Zhengqiang Li, Mingliang Ma, Di Han, Yibing Sun, Zhe Zheng, Ruijie Li, Ni-Bin Chang, Jianping Guo","doi":"10.5194/essd-16-2425-2024","DOIUrl":null,"url":null,"abstract":"Abstract. The Long-term Gap-free High-resolution Air Pollutants (LGHAP) concentration dataset generated in our previous study has provided spatially contiguous daily aerosol optical depth (AOD) and fine particulate matter (PM2.5) concentrations at a 1 km grid resolution in China since 2000. This advancement empowered unprecedented assessments of regional aerosol variations and their influence on the environment, health, and climate over the past 20 years. However, there is a need to enhance such a high-quality AOD and PM2.5 concentration dataset with new robust features and extended spatial coverage. In this study, we present version 2 of a global-scale LGHAP dataset (LGHAP v2), which was generated using improved big Earth data analytics via a seamless integration of versatile data science, pattern recognition, and machine learning methods. Specifically, multimodal AODs and air quality measurements acquired from relevant satellites, ground monitoring stations, and numerical models were harmonized by harnessing the capability of random-forest-based data-driven models. Subsequently, an improved tensor-flow-based AOD reconstruction algorithm was developed to weave the harmonized multisource AOD products together for filling data gaps in Multi-Angle Implementation of Atmospheric Correction (MAIAC) AOD retrievals from Terra. The results of the ablation experiments demonstrated better performance of the improved tensor-flow-based gap-filling method in terms of both convergence speed and data accuracy. Ground-based validation results indicated good data accuracy of this global gap-free AOD dataset, with a correlation coefficient (R) of 0.85 and a root mean square error (RMSE) of 0.14 compared to the worldwide AOD observations from the AErosol RObotic NETwork (AERONET), outperforming the purely reconstructed AODs (R = 0.83, RMSE = 0.15), but they were slightly worse than raw MAIAC AOD retrievals (R = 0.88, RMSE = 0.11). For PM2.5 concentration mapping, a novel deep-learning approach, termed the SCene-Aware ensemble learning Graph ATtention network (SCAGAT), was hereby applied. While accounting for the scene representativeness of data-driven models across regions, the SCAGAT algorithm performed better during spatial extrapolation, largely reducing modeling biases over regions with limited and/or even absent in situ PM2.5 concentration measurements. The validation results indicated that the gap-free PM2.5 concentration estimates exhibit higher prediction accuracies, with an R of 0.95 and an RMSE of 5.7 µg m−3, compared to PM2.5 concentration measurements obtained from former holdout sites worldwide. Overall, while leveraging state-of-the-art methods in data science and artificial intelligence, a quality-enhanced LGHAP v2 dataset was generated through big Earth data analytics by cohesively weaving together multimodal AODs and air quality measurements from diverse sources. The gap-free, high-resolution, and global coverage merits render the LGHAP v2 dataset an invaluable database for advancing aerosol- and haze-related studies as well as triggering multidisciplinary applications for environmental management, health-risk assessment, and climate change attribution. All gap-free AOD and PM2.5 concentration grids in the LGHAP v2 dataset, as well as the data user guide and relevant visualization codes, are publicly accessible at https://zenodo.org/communities/ecnu_lghap (last access: 3 April 2024, Bai and Li, 2023a).","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"17 1","pages":""},"PeriodicalIF":11.2000,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earth System Science Data","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.5194/essd-16-2425-2024","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract. The Long-term Gap-free High-resolution Air Pollutants (LGHAP) concentration dataset generated in our previous study has provided spatially contiguous daily aerosol optical depth (AOD) and fine particulate matter (PM2.5) concentrations at a 1 km grid resolution in China since 2000. This advancement empowered unprecedented assessments of regional aerosol variations and their influence on the environment, health, and climate over the past 20 years. However, there is a need to enhance such a high-quality AOD and PM2.5 concentration dataset with new robust features and extended spatial coverage. In this study, we present version 2 of a global-scale LGHAP dataset (LGHAP v2), which was generated using improved big Earth data analytics via a seamless integration of versatile data science, pattern recognition, and machine learning methods. Specifically, multimodal AODs and air quality measurements acquired from relevant satellites, ground monitoring stations, and numerical models were harmonized by harnessing the capability of random-forest-based data-driven models. Subsequently, an improved tensor-flow-based AOD reconstruction algorithm was developed to weave the harmonized multisource AOD products together for filling data gaps in Multi-Angle Implementation of Atmospheric Correction (MAIAC) AOD retrievals from Terra. The results of the ablation experiments demonstrated better performance of the improved tensor-flow-based gap-filling method in terms of both convergence speed and data accuracy. Ground-based validation results indicated good data accuracy of this global gap-free AOD dataset, with a correlation coefficient (R) of 0.85 and a root mean square error (RMSE) of 0.14 compared to the worldwide AOD observations from the AErosol RObotic NETwork (AERONET), outperforming the purely reconstructed AODs (R = 0.83, RMSE = 0.15), but they were slightly worse than raw MAIAC AOD retrievals (R = 0.88, RMSE = 0.11). For PM2.5 concentration mapping, a novel deep-learning approach, termed the SCene-Aware ensemble learning Graph ATtention network (SCAGAT), was hereby applied. While accounting for the scene representativeness of data-driven models across regions, the SCAGAT algorithm performed better during spatial extrapolation, largely reducing modeling biases over regions with limited and/or even absent in situ PM2.5 concentration measurements. The validation results indicated that the gap-free PM2.5 concentration estimates exhibit higher prediction accuracies, with an R of 0.95 and an RMSE of 5.7 µg m−3, compared to PM2.5 concentration measurements obtained from former holdout sites worldwide. Overall, while leveraging state-of-the-art methods in data science and artificial intelligence, a quality-enhanced LGHAP v2 dataset was generated through big Earth data analytics by cohesively weaving together multimodal AODs and air quality measurements from diverse sources. The gap-free, high-resolution, and global coverage merits render the LGHAP v2 dataset an invaluable database for advancing aerosol- and haze-related studies as well as triggering multidisciplinary applications for environmental management, health-risk assessment, and climate change attribution. All gap-free AOD and PM2.5 concentration grids in the LGHAP v2 dataset, as well as the data user guide and relevant visualization codes, are publicly accessible at https://zenodo.org/communities/ecnu_lghap (last access: 3 April 2024, Bai and Li, 2023a).
Earth System Science DataGEOSCIENCES, MULTIDISCIPLINARYMETEOROLOGY-METEOROLOGY & ATMOSPHERIC SCIENCES
CiteScore
18.00
自引率
5.30%
发文量
231
审稿时长
35 weeks
期刊介绍:
Earth System Science Data (ESSD) is an international, interdisciplinary journal that publishes articles on original research data in order to promote the reuse of high-quality data in the field of Earth system sciences. The journal welcomes submissions of original data or data collections that meet the required quality standards and have the potential to contribute to the goals of the journal. It includes sections dedicated to regular-length articles, brief communications (such as updates to existing data sets), commentaries, review articles, and special issues. ESSD is abstracted and indexed in several databases, including Science Citation Index Expanded, Current Contents/PCE, Scopus, ADS, CLOCKSS, CNKI, DOAJ, EBSCO, Gale/Cengage, GoOA (CAS), and Google Scholar, among others.