Evaluating Landslide Susceptibility Using Sampling Methodology and Multiple Machine Learning Models

Yingze Song, Degang Yang, Weicheng Wu, Xin Zhang, Jie Zhou, Zhaoxu Tian, Chencan Wang, Yingxu Song
{"title":"Evaluating Landslide Susceptibility Using Sampling Methodology and Multiple Machine Learning Models","authors":"Yingze Song, Degang Yang, Weicheng Wu, Xin Zhang, Jie Zhou, Zhaoxu Tian, Chencan Wang, Yingxu Song","doi":"10.3390/ijgi12050197","DOIUrl":null,"url":null,"abstract":"Landslide susceptibility assessment (LSA) based on machine learning methods has been widely used in landslide geological hazard management and research. However, the problem of sample imbalance in landslide susceptibility assessment, where landslide samples tend to be much smaller than non-landslide samples, is often overlooked. This problem is often one of the important factors affecting the performance of landslide susceptibility models. In this paper, we take the Wanzhou district of Chongqing city as an example, where the total number of data sets is more than 580,000 and the ratio of positive to negative samples is 1:19. We oversample or undersample the unbalanced landslide samples to make them balanced, and then compare the performance of machine learning models with different sampling strategies. Three classic machine learning algorithms, logistic regression, random forest and LightGBM, are used for LSA modeling. The results show that the model trained directly using the unbalanced sample dataset performs the worst, showing an extremely low recall rate, indicating that its predictive ability for landslide samples is extremely low and cannot be applied in practice. Compared with the original dataset, the sample set optimized through certain methods has demonstrated improved predictive performance across various classifiers, manifested in the improvement of AUC value and recall rate. The best model was the random forest model using over-sampling (O_RF) (AUC = 0.932).","PeriodicalId":14614,"journal":{"name":"ISPRS Int. J. Geo Inf.","volume":"62 1","pages":"197"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Int. J. Geo Inf.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/ijgi12050197","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Landslide susceptibility assessment (LSA) based on machine learning methods has been widely used in landslide geological hazard management and research. However, the problem of sample imbalance in landslide susceptibility assessment, where landslide samples tend to be much smaller than non-landslide samples, is often overlooked. This problem is often one of the important factors affecting the performance of landslide susceptibility models. In this paper, we take the Wanzhou district of Chongqing city as an example, where the total number of data sets is more than 580,000 and the ratio of positive to negative samples is 1:19. We oversample or undersample the unbalanced landslide samples to make them balanced, and then compare the performance of machine learning models with different sampling strategies. Three classic machine learning algorithms, logistic regression, random forest and LightGBM, are used for LSA modeling. The results show that the model trained directly using the unbalanced sample dataset performs the worst, showing an extremely low recall rate, indicating that its predictive ability for landslide samples is extremely low and cannot be applied in practice. Compared with the original dataset, the sample set optimized through certain methods has demonstrated improved predictive performance across various classifiers, manifested in the improvement of AUC value and recall rate. The best model was the random forest model using over-sampling (O_RF) (AUC = 0.932).
利用抽样方法和多机器学习模型评估滑坡易感性
基于机器学习方法的滑坡易感性评价(LSA)在滑坡地质灾害管理与研究中得到了广泛的应用。然而,在滑坡易感性评价中,滑坡样本往往比非滑坡样本小得多,因而往往忽视了样本不平衡问题。这一问题往往是影响滑坡敏感性模型性能的重要因素之一。本文以重庆市万州区为例,数据集总数超过58万,正样本与负样本之比为1:19。我们对不平衡的滑坡样本进行过采样或欠采样,使其平衡,然后比较不同采样策略下机器学习模型的性能。三种经典的机器学习算法,逻辑回归,随机森林和LightGBM,用于LSA建模。结果表明,直接使用不平衡样本数据集训练的模型表现最差,召回率极低,表明其对滑坡样本的预测能力极低,无法应用于实践。与原始数据集相比,经过一定方法优化后的样本集在各种分类器上的预测性能都有所提高,表现为AUC值和召回率的提高。最佳模型为过度抽样随机森林模型(O_RF) (AUC = 0.932)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信