{"title":"Uncertainty pattern and an integration strategy in flood susceptibility modeling: Limited sample size","authors":"Jun Liu, Xueqiang Zhao, Yangbo Chen, Huaizhang Sun, Yu Gu, Shichao Xu","doi":"10.1016/j.jhydrol.2025.133184","DOIUrl":null,"url":null,"abstract":"<div><div>Flood is one of the most destructive natural disasters occurring across the globe. Employing machine learning models to construct flood susceptibility maps has emerged as an effective strategy in disaster prevention and management. Sample size is one of the primary sources of uncertainty in machine learning model, posing significant challenges to the flood susceptibility in data-scarce regions. However, the understanding of uncertainty patterns and effective methods to improve modeling accuracy under limited sample conditions are still evolving. Here, we applied uncertainties analysis theory to clarify this pattern for seven base machine learning models. Further, an integration strategy was developed by coupling geographical similarity, semi-supervised learning and active learning method. The analysis of uncertainty pattern indicates that each base machine learning model exhibits varying degrees of tolerance to changes in sample size. Specifically, a threshold exists below which the accuracy of model declines sharply, leading to significant changes in the distribution patterns of predicted flood susceptibility maps. The proposed integration strategy can enhance the accuracy and stability of models operating with limited sample sizes. Applying the ensemble strategy and increasing the number of labeled samples from 10 to 500, the average AUC values for the models improved as follows: RF ranged from 0.76 to 0.85, SVM from 0.46 to 0.86, MLP from 0.77 to 0.86, NB from 0.75 to 0.86, KNN from 0.72 to 0.83, DT from 0.65 to 0.78, and LR from 0.70 to 0.86.The insights into uncertainty pattern derived from this study can help guide the balancing of sample collection costs with model accuracy. Moreover, the proposed integration strategy is expected to improve flood susceptibility prediction in areas with limited samples.</div></div>","PeriodicalId":362,"journal":{"name":"Journal of Hydrology","volume":"658 ","pages":"Article 133184"},"PeriodicalIF":5.9000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hydrology","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022169425005220","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
引用次数: 0
Abstract
Flood is one of the most destructive natural disasters occurring across the globe. Employing machine learning models to construct flood susceptibility maps has emerged as an effective strategy in disaster prevention and management. Sample size is one of the primary sources of uncertainty in machine learning model, posing significant challenges to the flood susceptibility in data-scarce regions. However, the understanding of uncertainty patterns and effective methods to improve modeling accuracy under limited sample conditions are still evolving. Here, we applied uncertainties analysis theory to clarify this pattern for seven base machine learning models. Further, an integration strategy was developed by coupling geographical similarity, semi-supervised learning and active learning method. The analysis of uncertainty pattern indicates that each base machine learning model exhibits varying degrees of tolerance to changes in sample size. Specifically, a threshold exists below which the accuracy of model declines sharply, leading to significant changes in the distribution patterns of predicted flood susceptibility maps. The proposed integration strategy can enhance the accuracy and stability of models operating with limited sample sizes. Applying the ensemble strategy and increasing the number of labeled samples from 10 to 500, the average AUC values for the models improved as follows: RF ranged from 0.76 to 0.85, SVM from 0.46 to 0.86, MLP from 0.77 to 0.86, NB from 0.75 to 0.86, KNN from 0.72 to 0.83, DT from 0.65 to 0.78, and LR from 0.70 to 0.86.The insights into uncertainty pattern derived from this study can help guide the balancing of sample collection costs with model accuracy. Moreover, the proposed integration strategy is expected to improve flood susceptibility prediction in areas with limited samples.
期刊介绍:
The Journal of Hydrology publishes original research papers and comprehensive reviews in all the subfields of the hydrological sciences including water based management and policy issues that impact on economics and society. These comprise, but are not limited to the physical, chemical, biogeochemical, stochastic and systems aspects of surface and groundwater hydrology, hydrometeorology and hydrogeology. Relevant topics incorporating the insights and methodologies of disciplines such as climatology, water resource systems, hydraulics, agrohydrology, geomorphology, soil science, instrumentation and remote sensing, civil and environmental engineering are included. Social science perspectives on hydrological problems such as resource and ecological economics, environmental sociology, psychology and behavioural science, management and policy analysis are also invited. Multi-and interdisciplinary analyses of hydrological problems are within scope. The science published in the Journal of Hydrology is relevant to catchment scales rather than exclusively to a local scale or site.