Impact of sampling for landslide susceptibility assessment using interpretable machine learning models

IF 3.7 2区工程技术 Q3 ENGINEERING, ENVIRONMENTAL

Bulletin of Engineering Geology and the Environment Pub Date : 2024-10-25 DOI:10.1007/s10064-024-03980-8

Bin Wu, Zhenming Shi, Hongchao Zheng, Ming Peng, Shaoqiang Meng

{"title":"Impact of sampling for landslide susceptibility assessment using interpretable machine learning models","authors":"Bin Wu, Zhenming Shi, Hongchao Zheng, Ming Peng, Shaoqiang Meng","doi":"10.1007/s10064-024-03980-8","DOIUrl":null,"url":null,"abstract":"<div><p>Landslide susceptibility assessment has made significant strides in meeting the urgent requirements for disaster prevention and mitigation. However, the inherent imbalance in landslide distributions poses challenges and thus various sampling strategies emerge. Yet, these strategies alter the original dataset distribution, necessitating a deeper understanding of their impact on susceptibility mapping. This study integrates multi-source information, including morphological, geological, hydrological, and land-use data in the northwest of Oregon State, to train four models—Decision Trees, Random Forest, Adaboost, and Gradient Tree Boosting —using both balanced and imbalanced training sets. Results reveal that models trained on imbalanced datasets generally exhibit superior classification performance. Models using balanced datasets predict more positives (landslides) at higher susceptibility levels, while those applied imbalanced datasets classified more negatives at lower levels. By employing the Shapley Additive Explanations method, the consistency in model decision-making was established and identified the top five most influential factors: distance to roads, slope roughness, geological age, roughness, and elevation. Furthermore, the consequences of FN (False Negatives) and FP (False Positives) were discussed, concluding that FN may lead to loss of life, and FP may result from prediction inaccuracies, dataset incompleteness, and forthcoming landslides, hence allowing for a certain amount. It suggests that models with balanced datasets are preferable for minimizing the quantity of FN and effectively capturing landslides at high and very high susceptibility areas. The findings provide valuable insights into the impact of positives and negatives ratios on landslide susceptibility and offer support for optimizing dataset sampling.</p></div>","PeriodicalId":500,"journal":{"name":"Bulletin of Engineering Geology and the Environment","volume":"83 11","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10064-024-03980-8.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bulletin of Engineering Geology and the Environment","FirstCategoryId":"5","ListUrlMain":"https://link.springer.com/article/10.1007/s10064-024-03980-8","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Landslide susceptibility assessment has made significant strides in meeting the urgent requirements for disaster prevention and mitigation. However, the inherent imbalance in landslide distributions poses challenges and thus various sampling strategies emerge. Yet, these strategies alter the original dataset distribution, necessitating a deeper understanding of their impact on susceptibility mapping. This study integrates multi-source information, including morphological, geological, hydrological, and land-use data in the northwest of Oregon State, to train four models—Decision Trees, Random Forest, Adaboost, and Gradient Tree Boosting —using both balanced and imbalanced training sets. Results reveal that models trained on imbalanced datasets generally exhibit superior classification performance. Models using balanced datasets predict more positives (landslides) at higher susceptibility levels, while those applied imbalanced datasets classified more negatives at lower levels. By employing the Shapley Additive Explanations method, the consistency in model decision-making was established and identified the top five most influential factors: distance to roads, slope roughness, geological age, roughness, and elevation. Furthermore, the consequences of FN (False Negatives) and FP (False Positives) were discussed, concluding that FN may lead to loss of life, and FP may result from prediction inaccuracies, dataset incompleteness, and forthcoming landslides, hence allowing for a certain amount. It suggests that models with balanced datasets are preferable for minimizing the quantity of FN and effectively capturing landslides at high and very high susceptibility areas. The findings provide valuable insights into the impact of positives and negatives ratios on landslide susceptibility and offer support for optimizing dataset sampling.

查看原文本刊更多论文

使用可解释的机器学习模型对滑坡易发性评估进行取样的影响

滑坡易发性评估在满足防灾减灾的迫切需求方面取得了重大进展。然而，滑坡分布固有的不平衡性带来了挑战，因此出现了各种采样策略。然而，这些策略会改变原始数据集的分布，因此有必要深入了解其对易损性绘图的影响。本研究整合了俄勒冈州西北部的形态、地质、水文和土地利用数据等多源信息，利用平衡和不平衡训练集训练了决策树、随机森林、Adaboost 和梯度树提升四种模型。结果显示，在不平衡数据集上训练的模型通常表现出更优越的分类性能。使用平衡数据集的模型在较高的易感性水平上预测出更多的阳性结果（滑坡），而应用不平衡数据集的模型在较低的易感性水平上分类出更多的阴性结果。通过采用夏普利加法解释方法，确定了模型决策的一致性，并确定了影响最大的五个因素：与道路的距离、斜坡粗糙度、地质年代、粗糙度和海拔高度。此外，还讨论了 FN（假阴性）和 FP（假阳性）的后果，得出结论：FN 可能导致生命损失，FP 可能是预测不准确、数据集不完整和即将发生的滑坡造成的，因此允许一定量的预测。研究表明，具有均衡数据集的模型可最大限度地减少 FN 的数量，并有效捕捉高易发区和极高易发区的滑坡。研究结果为了解正负比对滑坡易发性的影响提供了宝贵的见解，并为优化数据集取样提供了支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Bulletin of Engineering Geology and the Environment 工程技术-地球科学综合

CiteScore

7.10

自引率

11.90%

发文量

445

审稿时长

4.1 months

期刊介绍： Engineering geology is defined in the statutes of the IAEG as the science devoted to the investigation, study and solution of engineering and environmental problems which may arise as the result of the interaction between geology and the works or activities of man, as well as of the prediction of and development of measures for the prevention or remediation of geological hazards. Engineering geology embraces: • the applications/implications of the geomorphology, structural geology, and hydrogeological conditions of geological formations; • the characterisation of the mineralogical, physico-geomechanical, chemical and hydraulic properties of all earth materials involved in construction, resource recovery and environmental change; • the assessment of the mechanical and hydrological behaviour of soil and rock masses; • the prediction of changes to the above properties with time; • the determination of the parameters to be considered in the stability analysis of engineering works and earth masses.