Le Ngoc Hanh , Le Phuc Chi Lang , Phan Anh Hang , Nguyen Van An , Nguyen Hoang Son
{"title":"A novel approach in comparing the performance of bivariate statistical methods, boosting, and stacking models in flood susceptibility assessment","authors":"Le Ngoc Hanh , Le Phuc Chi Lang , Phan Anh Hang , Nguyen Van An , Nguyen Hoang Son","doi":"10.1016/j.jenvman.2025.125670","DOIUrl":null,"url":null,"abstract":"<div><div>Evaluating the performance of flood susceptibility assessment methodologies is critical for optimizing flood management strategies. This study presents a novel methodology for comparing bivariate statistical methods, boosting, and stacking models to determine the most effective technique for flood susceptibility assessment in Hoa Vang District, Da Nang City, Vietnam. Twelve key factors from an initial set of seventeen factors determine their impact on flooding based on information gain ratio (IGR) and multicollinearity analysis. The study extracted 2,172 samples from Sentinel 1 imagery and field survey data, dividing them into training (70 %) and testing (30 %) sets using a random method. The two primary indices utilized for the bivariate statistical approach were the weight of evidence (WoE) and frequency ratio (FR). In bivariate statistics, the study utilizes two methods for classifying factors influencing flooding: the traditional Jenks natural breaks (JNB) and an improved version of JNB that accounts for correlation with flood data. Boosting models (AdaBoost (AB), XGBoost (XGB), CatBoost (CB), Light Gradient Boosting Machine (LGB), and Gradient Boosting (GB)) were employed both independently and in combination as base learners within the stacking model framework. Performance evaluation utilized the receiver operating characteristic and area under the curve (ROC-AUC), Kappa statistics, and other indices. The results show that the stacking models delivered the highest evaluation performance, with an average score of 0.882, outperforming the boosting models (0.76) and significantly surpassing the flood susceptibility maps generated by the bivariate statistical methods WoE (0.282) and FR (0.136). The study identified high and very high-risk flood zones, encompassing 14 % of the district, focusing on the southern communes. These findings provide valuable insights for enhancing flood susceptibility management and mitigation strategies, offering a robust tool for decision-making in flood-prone areas.</div></div>","PeriodicalId":356,"journal":{"name":"Journal of Environmental Management","volume":"387 ","pages":"Article 125670"},"PeriodicalIF":8.0000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Environmental Management","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0301479725016469","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Evaluating the performance of flood susceptibility assessment methodologies is critical for optimizing flood management strategies. This study presents a novel methodology for comparing bivariate statistical methods, boosting, and stacking models to determine the most effective technique for flood susceptibility assessment in Hoa Vang District, Da Nang City, Vietnam. Twelve key factors from an initial set of seventeen factors determine their impact on flooding based on information gain ratio (IGR) and multicollinearity analysis. The study extracted 2,172 samples from Sentinel 1 imagery and field survey data, dividing them into training (70 %) and testing (30 %) sets using a random method. The two primary indices utilized for the bivariate statistical approach were the weight of evidence (WoE) and frequency ratio (FR). In bivariate statistics, the study utilizes two methods for classifying factors influencing flooding: the traditional Jenks natural breaks (JNB) and an improved version of JNB that accounts for correlation with flood data. Boosting models (AdaBoost (AB), XGBoost (XGB), CatBoost (CB), Light Gradient Boosting Machine (LGB), and Gradient Boosting (GB)) were employed both independently and in combination as base learners within the stacking model framework. Performance evaluation utilized the receiver operating characteristic and area under the curve (ROC-AUC), Kappa statistics, and other indices. The results show that the stacking models delivered the highest evaluation performance, with an average score of 0.882, outperforming the boosting models (0.76) and significantly surpassing the flood susceptibility maps generated by the bivariate statistical methods WoE (0.282) and FR (0.136). The study identified high and very high-risk flood zones, encompassing 14 % of the district, focusing on the southern communes. These findings provide valuable insights for enhancing flood susceptibility management and mitigation strategies, offering a robust tool for decision-making in flood-prone areas.
期刊介绍:
The Journal of Environmental Management is a journal for the publication of peer reviewed, original research for all aspects of management and the managed use of the environment, both natural and man-made.Critical review articles are also welcome; submission of these is strongly encouraged.