Flood susceptibility mapping: Integrating machine learning and GIS for enhanced risk assessment

IF 2.6 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Applied Computing and Geosciences Pub Date : 2024-08-03 DOI:10.1016/j.acags.2024.100183

Zelalem Demissie , Prashant Rimal , Wondwosen M. Seyoum , Atri Dutta , Glen Rimmington

{"title":"Flood susceptibility mapping: Integrating machine learning and GIS for enhanced risk assessment","authors":"Zelalem Demissie , Prashant Rimal , Wondwosen M. Seyoum , Atri Dutta , Glen Rimmington","doi":"10.1016/j.acags.2024.100183","DOIUrl":null,"url":null,"abstract":"<div><p>Flooding presents a formidable challenge in the United States, endangering lives and causing substantial economic damage, averaging around $5 billion annually. Addressing this issue and improving community resilience is imperative. This project employed machine learning techniques and publicly available data to explore the factors influencing flooding and to develop flood susceptibility maps at various spatial resolutions. Six machine learning algorithms, including Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), K-nearest neighbor (KNN), Adaptive Boosting (Ada Boost), and Extreme Gradient Boosting (XGB) were used. Geospatial datasets comprising thirteen predictor variables and 1528 flood inventory data collected since 1996 were analyzed. The predictor variables are rainfall, elevation, slope, aspect, flow direction, flow accumulation, Topographic Wetness Index (TWI), distance from the nearest stream, evapotranspiration, land cover, impervious surface, land surface temperature, and hydrologic soil group. Five hundred twenty-eight non-flood data points were randomly created using a stream buffer for two scenarios. A total of 2964 data points were classified into flooded (1) and non-flooded (0) categories and used as a target. Overall, testing results showed that the XGB and RF models performed relatively well in both cases over multiple resolutions compared to other models, with an accuracy ranging from 0.82 to 0.97. Variable importance analysis depicted that predictor variables such as distance from the streams, hydrologic soil type, rainfall, elevation, and impervious surfaces significantly affected flood prediction, suggesting a strong association with the underlying driving process. The improved performance and the variation of the susceptible areas across two scenarios showed that considering predictor variables with multiple resolutions and appropriate non-flooding training points is critical for developing flood-susceptibility models. Furthermore, using tree-based ensemble algorithms like RF and XG boost in the stack generalization approach can help achieve robustness in a flood susceptibility model where multiple algorithms are being evaluated.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"23 ","pages":"Article 100183"},"PeriodicalIF":2.6000,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197424000302/pdfft?md5=9e61c017b8afc6f574d15d4606f34de9&pid=1-s2.0-S2590197424000302-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Geosciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590197424000302","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Flooding presents a formidable challenge in the United States, endangering lives and causing substantial economic damage, averaging around $5 billion annually. Addressing this issue and improving community resilience is imperative. This project employed machine learning techniques and publicly available data to explore the factors influencing flooding and to develop flood susceptibility maps at various spatial resolutions. Six machine learning algorithms, including Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), K-nearest neighbor (KNN), Adaptive Boosting (Ada Boost), and Extreme Gradient Boosting (XGB) were used. Geospatial datasets comprising thirteen predictor variables and 1528 flood inventory data collected since 1996 were analyzed. The predictor variables are rainfall, elevation, slope, aspect, flow direction, flow accumulation, Topographic Wetness Index (TWI), distance from the nearest stream, evapotranspiration, land cover, impervious surface, land surface temperature, and hydrologic soil group. Five hundred twenty-eight non-flood data points were randomly created using a stream buffer for two scenarios. A total of 2964 data points were classified into flooded (1) and non-flooded (0) categories and used as a target. Overall, testing results showed that the XGB and RF models performed relatively well in both cases over multiple resolutions compared to other models, with an accuracy ranging from 0.82 to 0.97. Variable importance analysis depicted that predictor variables such as distance from the streams, hydrologic soil type, rainfall, elevation, and impervious surfaces significantly affected flood prediction, suggesting a strong association with the underlying driving process. The improved performance and the variation of the susceptible areas across two scenarios showed that considering predictor variables with multiple resolutions and appropriate non-flooding training points is critical for developing flood-susceptibility models. Furthermore, using tree-based ensemble algorithms like RF and XG boost in the stack generalization approach can help achieve robustness in a flood susceptibility model where multiple algorithms are being evaluated.

查看原文本刊更多论文

洪水易感性绘图：整合机器学习和地理信息系统，加强风险评估

洪水给美国带来了严峻的挑战，危及生命并造成巨大的经济损失，平均每年约 50 亿美元。解决这一问题并提高社区抗灾能力势在必行。该项目采用机器学习技术和公开数据来探索影响洪水的因素，并绘制不同空间分辨率的洪水易感性地图。使用了六种机器学习算法，包括逻辑回归（LR）、随机森林（RF）、支持向量机（SVM）、K-近邻（KNN）、自适应提升（Ada Boost）和极端梯度提升（XGB）。分析的地理空间数据集包括 13 个预测变量和自 1996 年以来收集的 1528 个洪水清单数据。预测变量包括降雨量、海拔高度、坡度、坡向、流向、流量累积、地形湿润指数 (TWI)、与最近溪流的距离、蒸散量、土地覆盖、不透水表面、地表温度和水文土壤组别。在两种情况下，使用溪流缓冲区随机创建了 528 个非洪水数据点。共有 2964 个数据点被分为洪水泛滥（1）和非洪水泛滥（0）两类，并被用作目标。总体而言，测试结果表明，与其他模型相比，XGB 和 RF 模型在两种情况下的多种分辨率下表现相对较好，准确率在 0.82 到 0.97 之间。变量重要性分析表明，与溪流的距离、水文土壤类型、降雨量、海拔高度和不透水表面等预测变量对洪水预测有显著影响，表明与基本驱动过程有密切联系。在两种情况下，易受影响区域的性能和差异都有所改善，这表明考虑多分辨率的预测变量和适当的非洪水训练点对于开发洪水易感性模型至关重要。此外，在堆栈泛化方法中使用基于树的集合算法（如 RF 和 XG boost）有助于在评估多种算法的洪水易感性模型中实现稳健性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊