Towards long-lead forecasting of extreme flood events: a data mining framework for precipitation cluster precursors identification

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI:10.1145/2487575.2488220

Dawei Wang, W. Ding, Kui Yu, Xindong Wu, Ping Chen, D. Small, S. Islam

{"title":"Towards long-lead forecasting of extreme flood events: a data mining framework for precipitation cluster precursors identification","authors":"Dawei Wang, W. Ding, Kui Yu, Xindong Wu, Ping Chen, D. Small, S. Islam","doi":"10.1145/2487575.2488220","DOIUrl":null,"url":null,"abstract":"The development of disastrous flood forecasting techniques able to provide warnings at a long lead-time (5-15 days) is of great importance to society. Extreme Flood is usually a consequence of a sequence of precipitation events occurring over from several days to several weeks. Though precise short-term forecasting the magnitude and extent of individual precipitation event is still beyond our reach, long-term forecasting of precipitation clusters can be attempted by identifying persistent atmospheric regimes that are conducive for the precipitation clusters. However, such forecasting will suffer from overwhelming number of relevant features and high imbalance of sample sets. In this paper, we propose an integrated data mining framework for identifying the precursors to precipitation event clusters and use this information to predict extended periods of extreme precipitation and subsequent floods. We synthesize a representative feature set that describes the atmosphere motion, and apply a streaming feature selection algorithm to online identify the precipitation precursors from the enormous feature space. A hierarchical re-sampling approach is embedded in the framework to deal with the imbalance problem. An extensive empirical study is conducted on historical precipitation and associated flood data collected in the State of Iowa. Utilizing our framework a few physically meaningful precipitation cluster precursor sets are identified from millions of features. More than 90% of extreme precipitation events are captured by the proposed prediction model using precipitation cluster precursors with a lead time of more than 5 days.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2487575.2488220","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

Abstract

The development of disastrous flood forecasting techniques able to provide warnings at a long lead-time (5-15 days) is of great importance to society. Extreme Flood is usually a consequence of a sequence of precipitation events occurring over from several days to several weeks. Though precise short-term forecasting the magnitude and extent of individual precipitation event is still beyond our reach, long-term forecasting of precipitation clusters can be attempted by identifying persistent atmospheric regimes that are conducive for the precipitation clusters. However, such forecasting will suffer from overwhelming number of relevant features and high imbalance of sample sets. In this paper, we propose an integrated data mining framework for identifying the precursors to precipitation event clusters and use this information to predict extended periods of extreme precipitation and subsequent floods. We synthesize a representative feature set that describes the atmosphere motion, and apply a streaming feature selection algorithm to online identify the precipitation precursors from the enormous feature space. A hierarchical re-sampling approach is embedded in the framework to deal with the imbalance problem. An extensive empirical study is conducted on historical precipitation and associated flood data collected in the State of Iowa. Utilizing our framework a few physically meaningful precipitation cluster precursor sets are identified from millions of features. More than 90% of extreme precipitation events are captured by the proposed prediction model using precipitation cluster precursors with a lead time of more than 5 days.

查看原文本刊更多论文

极端洪水事件的长期预测:降水集群前兆识别的数据挖掘框架

发展能够在较长时间(5-15天)内提供预警的灾害性洪水预报技术对社会具有重要意义。极端洪水通常是连续几天到几周发生的一系列降水事件的结果。虽然对单个降水事件的强度和范围的精确短期预报仍然超出了我们的能力范围，但可以通过确定有利于降水群的持续大气状态来尝试对降水群进行长期预报。然而，这种预测会受到相关特征过多和样本集高度不平衡的影响。在本文中，我们提出了一个集成的数据挖掘框架来识别降水事件集群的前兆，并使用这些信息来预测延长的极端降水和随后的洪水。我们合成了一个描述大气运动的代表性特征集，并应用流特征选择算法从海量特征空间中在线识别降水前兆。在该框架中嵌入了一种分层重采样方法来处理不平衡问题。本文对爱荷华州的历史降水和相关洪水数据进行了广泛的实证研究。利用我们的框架，从数百万个特征中识别出一些物理上有意义的降水簇前兆集。所提出的预报模式利用提前期大于5 d的降水簇前兆捕获了90%以上的极端降水事件。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

自引率

0.00%

发文量