Optimal feature selection for improved ML based reconstruction of Global Terrestrial Water Storage Anomalies

IF 11.6 1区地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY

Earth System Science Data Pub Date : 2024-05-17 DOI:10.5194/essd-2024-109

Nehar Mandal, Prabal Das, Kironmala Chanda

{"title":"Optimal feature selection for improved ML based reconstruction of Global Terrestrial Water Storage Anomalies","authors":"Nehar Mandal, Prabal Das, Kironmala Chanda","doi":"10.5194/essd-2024-109","DOIUrl":null,"url":null,"abstract":"<strong>Abstract.</strong> Understanding long-term Terrestrial water storage (TWS) variations is vital for investigating hydrological extreme events, managing water resources, and assessing climate change impacts. However, the limited data duration from the Gravity Recovery and Climate Experiment (GRACE) and its follow-on missions (GRACE-FO) poses challenges for comprehensive long-term analysis. In this study, we reconstruct TWS anomalies (TWSA) for the period Jan 1960 to Dec 2022 thereby filling data gaps between GRACE and GRACE-FO missions as well as generating a complete dataset for the pre-GRACE era. The workflow involves identifying optimal predictors from land surface model (LSM) outputs, meteorological variables, and climatic indices using a novel Bayesian Network (BN) technique for grid-based TWSA simulations. Climate indices, like the Oceanic Niño Index and Dipole Mode Index, are selected as optimal predictors for a large number of grids globally, along with TWSA from LSM outputs. The most effective machine learning (ML) algorithms among Convolutional Neural Network (CNN), Support Vector Regression (SVR), Extra Trees Regressor (ETR), and Stacking Ensemble Regression (SER) models are evaluated at each grid location to achieve optimal reproducibility. Globally, ETR performs best for most of the grids which is also noticed at the river-basin scale, particularly for the Ganga-Brahmaputra-Meghana, Godavari, Krishna, Limpopo, and Nile river basins. The simulated TWSA (BNML_TWSA) outperformed the TWSA from LSM outputs when evaluated against GRACE datasets. Improvements are particularly noted in the river basins such as Godavari, Krishna, Danube, Amazon, etc., with median values of the correlation coefficient, Nash-Sutcliffe efficiency, and RMSE for all grids in Godavari, India, being 0.927, 0.839, and 63.7 mm respectively. A comparison with TWSA reconstructed in recent studies indicates that the proposed BNML_TWSA outperforms them globally as well as for all the 11 major river basins examined. The presented dataset is published at https://doi.org/10.6084/m9.figshare.25376695 (Mandal et al., 2024) and updates will be published when needed.","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"1 1","pages":""},"PeriodicalIF":11.6000,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earth System Science Data","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.5194/essd-2024-109","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract. Understanding long-term Terrestrial water storage (TWS) variations is vital for investigating hydrological extreme events, managing water resources, and assessing climate change impacts. However, the limited data duration from the Gravity Recovery and Climate Experiment (GRACE) and its follow-on missions (GRACE-FO) poses challenges for comprehensive long-term analysis. In this study, we reconstruct TWS anomalies (TWSA) for the period Jan 1960 to Dec 2022 thereby filling data gaps between GRACE and GRACE-FO missions as well as generating a complete dataset for the pre-GRACE era. The workflow involves identifying optimal predictors from land surface model (LSM) outputs, meteorological variables, and climatic indices using a novel Bayesian Network (BN) technique for grid-based TWSA simulations. Climate indices, like the Oceanic Niño Index and Dipole Mode Index, are selected as optimal predictors for a large number of grids globally, along with TWSA from LSM outputs. The most effective machine learning (ML) algorithms among Convolutional Neural Network (CNN), Support Vector Regression (SVR), Extra Trees Regressor (ETR), and Stacking Ensemble Regression (SER) models are evaluated at each grid location to achieve optimal reproducibility. Globally, ETR performs best for most of the grids which is also noticed at the river-basin scale, particularly for the Ganga-Brahmaputra-Meghana, Godavari, Krishna, Limpopo, and Nile river basins. The simulated TWSA (BNML_TWSA) outperformed the TWSA from LSM outputs when evaluated against GRACE datasets. Improvements are particularly noted in the river basins such as Godavari, Krishna, Danube, Amazon, etc., with median values of the correlation coefficient, Nash-Sutcliffe efficiency, and RMSE for all grids in Godavari, India, being 0.927, 0.839, and 63.7 mm respectively. A comparison with TWSA reconstructed in recent studies indicates that the proposed BNML_TWSA outperforms them globally as well as for all the 11 major river basins examined. The presented dataset is published at https://doi.org/10.6084/m9.figshare.25376695 (Mandal et al., 2024) and updates will be published when needed.

查看原文本刊更多论文

优化特征选择，改进基于 ML 的全球陆地蓄水异常重建方法

摘要了解陆地储水量（TWS）的长期变化对于调查水文极端事件、管理水资源和评估气候变化影响至关重要。然而，重力恢复与气候实验（GRACE）及其后续任务（GRACE-FO）的数据持续时间有限，给全面的长期分析带来了挑战。在这项研究中，我们重建了 1960 年 1 月至 2022 年 12 月期间的 TWS 异常（TWSA），从而填补了 GRACE 和 GRACE-FO 任务之间的数据空白，并生成了前 GRACE 时代的完整数据集。工作流程包括使用一种基于网格的 TWSA 模拟的新型贝叶斯网络（BN）技术，从陆地表面模式（LSM）输出、气象变量和气候指数中确定最佳预测因子。气候指数，如海洋尼诺指数和偶极模式指数，被选为全球大量网格的最佳预测因子，并与来自 LSM 输出的 TWSA 一起使用。在每个网格位置评估了卷积神经网络（CNN）、支持向量回归（SVR）、额外树回归（ETR）和堆叠集合回归（SER）模型中最有效的机器学习（ML）算法，以实现最佳可重复性。在全球范围内，ETR 在大多数网格中表现最佳，这一点在流域尺度上也同样明显，尤其是在恒河-rahmaputra-Meghana 河、戈达瓦里河、克里希纳河、林波波河和尼罗河流域。根据 GRACE 数据集进行评估时，模拟的 TWSA（BNML_TWSA）优于 LSM 输出的 TWSA。在戈达瓦里、克里希纳、多瑙河、亚马逊河等流域的改进尤为明显，印度戈达瓦里所有网格的相关系数、Nash-Sutcliffe 效率和 RMSE 的中值分别为 0.927、0.839 和 63.7 毫米。与近期研究中重建的 TWSA 比较表明，所提出的 BNML_TWSA 在总体上以及在所考察的所有 11 个主要流域中都优于它们。所提供的数据集已发布在 https://doi.org/10.6084/m9.figshare.25376695 上（Mandal 等，2024 年），并将在需要时发布更新。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Earth System Science Data GEOSCIENCES, MULTIDISCIPLINARYMETEOROLOGY-METEOROLOGY & ATMOSPHERIC SCIENCES

CiteScore

18.00

自引率

5.30%

发文量

231

审稿时长

35 weeks

期刊介绍： Earth System Science Data (ESSD) is an international, interdisciplinary journal that publishes articles on original research data in order to promote the reuse of high-quality data in the field of Earth system sciences. The journal welcomes submissions of original data or data collections that meet the required quality standards and have the potential to contribute to the goals of the journal. It includes sections dedicated to regular-length articles, brief communications (such as updates to existing data sets), commentaries, review articles, and special issues. ESSD is abstracted and indexed in several databases, including Science Citation Index Expanded, Current Contents/PCE, Scopus, ADS, CLOCKSS, CNKI, DOAJ, EBSCO, Gale/Cengage, GoOA (CAS), and Google Scholar, among others.