Unveiling flood-generating mechanisms using circular statistics-based machine learning approach without the need for discharge data during inference

IF 2.4 4区环境科学与生态学 Q2 WATER RESOURCES

Hydrology Research Pub Date : 2023-09-20 DOI:10.2166/nh.2023.058

Zhi Zhang, Dagang Wang, Xinxin Wu, Yiwen Mei, Jianxiu Qiu, Jinxin Zhu

{"title":"Unveiling flood-generating mechanisms using circular statistics-based machine learning approach without the need for discharge data during inference","authors":"Zhi Zhang, Dagang Wang, Xinxin Wu, Yiwen Mei, Jianxiu Qiu, Jinxin Zhu","doi":"10.2166/nh.2023.058","DOIUrl":null,"url":null,"abstract":"Understanding the drivers of flooding is essential for flood disaster prevention. However, conventional flood prediction methods are hindered by their reliance on local discharge data, which can be constrained by limited spatial resolution. To address this limitation, we present a machine learning model that can categorize floods without requiring discharge data during inference. We first use circular statistics to calculate the relative importance of three candidate flood-generating mechanisms. Global land areas are classified into three primary categories and eight sub-categories based on the proportion of relative importance. A random forest model is then applied to identify the flood types by assuming that the discharge data is unavailable. The findings from circular statistics highlight that globally, soil moisture excess is the most influential driver of floods followed by extreme precipitation and snowmelt, with an average relative importance of 0.535, 0.387, and 0.078, respectively. The RF model performs well in resembling the three primary flood categories with an accuracy of 0.701 and a F1-score of 0.692 in 10-fold cross-validation. The trained gridded-based model provides a swift and efficient approach for analyzing flood mechanisms, even in limited discharge scenarios, allowing for rapid insights.","PeriodicalId":13096,"journal":{"name":"Hydrology Research","volume":"47 1","pages":"0"},"PeriodicalIF":2.4000,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hydrology Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2166/nh.2023.058","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"WATER RESOURCES","Score":null,"Total":0}

引用次数: 0

Abstract

Understanding the drivers of flooding is essential for flood disaster prevention. However, conventional flood prediction methods are hindered by their reliance on local discharge data, which can be constrained by limited spatial resolution. To address this limitation, we present a machine learning model that can categorize floods without requiring discharge data during inference. We first use circular statistics to calculate the relative importance of three candidate flood-generating mechanisms. Global land areas are classified into three primary categories and eight sub-categories based on the proportion of relative importance. A random forest model is then applied to identify the flood types by assuming that the discharge data is unavailable. The findings from circular statistics highlight that globally, soil moisture excess is the most influential driver of floods followed by extreme precipitation and snowmelt, with an average relative importance of 0.535, 0.387, and 0.078, respectively. The RF model performs well in resembling the three primary flood categories with an accuracy of 0.701 and a F1-score of 0.692 in 10-fold cross-validation. The trained gridded-based model provides a swift and efficient approach for analyzing flood mechanisms, even in limited discharge scenarios, allowing for rapid insights.

查看原文本刊更多论文

使用基于循环统计的机器学习方法揭示洪水产生机制，而不需要在推理过程中使用流量数据

了解洪水的驱动因素对防洪至关重要。然而，传统的洪水预测方法依赖于局部流量数据，而这些数据受限于有限的空间分辨率。为了解决这一限制，我们提出了一种机器学习模型，可以在推理期间不需要流量数据的情况下对洪水进行分类。我们首先使用循环统计来计算三种候选洪水发生机制的相对重要性。根据相对重要性的比例，将全球陆地面积划分为3个主要类别和8个次级类别。然后，假设流量数据不可用，应用随机森林模型来识别洪水类型。循环统计结果表明，在全球范围内，土壤水分过剩是洪水的最大驱动因素，其次是极端降水和融雪，其平均相对重要性分别为0.535、0.387和0.078。在10倍交叉验证中，RF模型对三种主要洪水类别具有较好的相似性，准确率为0.701,f1得分为0.692。经过训练的基于网格的模型提供了一种快速有效的方法来分析洪水机制，即使在有限的流量情况下，也可以快速洞察。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Hydrology Research WATER RESOURCES-

CiteScore

5.00

自引率

7.40%

发文量

审稿时长

3.8 months

期刊介绍： Hydrology Research provides international coverage on all aspects of hydrology in its widest sense, and welcomes the submission of papers from across the subject. While emphasis is placed on studies of the hydrological cycle, the Journal also covers the physics and chemistry of water. Hydrology Research is intended to be a link between basic hydrological research and the practical application of scientific results within the broad field of water management.