Landslide susceptibility prediction modelling based on semi-supervised XGBoost model

IF 1.4 4区 地球科学 Q3 GEOSCIENCES, MULTIDISCIPLINARY
Geological Journal Pub Date : 2024-03-08 DOI:10.1002/gj.4936
Qiangqiang Shua, Hongbin Peng, Jingkai Li
{"title":"Landslide susceptibility prediction modelling based on semi-supervised XGBoost model","authors":"Qiangqiang Shua,&nbsp;Hongbin Peng,&nbsp;Jingkai Li","doi":"10.1002/gj.4936","DOIUrl":null,"url":null,"abstract":"<p>In the process of landslide susceptibility prediction (LSP) modelling, there are some problems in the model dataset relating to landslide and non-landslide samples, such as landslide sample errors, subjective randomness and low accuracy of non-landslide sample selection. In order to solve the above problems, a semi-supervised machine learning model for LSP is innovatively proposed. Firstly, Yanchang County of Shanxi Province, China, is taken as the study area. Secondly, the frequency ratio values of 12 environmental factors (elevation, slope, aspect, etc.) and the randomly selected twice non-landslides are used to form the initial model datasets. Thirdly, an extreme gradient boosting (XGBoost) model is adopted for training and testing the initial datasets, so as to produce initial landslide susceptibility maps (LSMs) which are divided into very low, low, moderate, high and very high susceptibility levels. Next, the landslide samples in initial LSMs with very low and low susceptibility levels are excluded to improve the accuracy of landslide samples, and the unlabelled twice non-landslide samples in initial LSMs with low and very low susceptibility levels are randomly selected to ensure the accuracy of non-landslide samples. These new obtained landslide and non-landslide samples are reimported into XGBoost model to construct the semi-supervised XGBoost (SSXGBoost) model. Finally, accuracy, kappa coefficient and statistical indexes of susceptibility indexes are adopted to assess the LSP performance of XGBoost and SSXGBoost models. Results show that SSXGBoost model has remarkably better LSP performance than that of XGBoost model. Conclusively, the proposed SSXGBoost model effectively overcomes the problems that the accuracy of landslide samples needs to be further improved and that non-landslide samples are difficult to select accurately.</p>","PeriodicalId":12784,"journal":{"name":"Geological Journal","volume":null,"pages":null},"PeriodicalIF":1.4000,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geological Journal","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/gj.4936","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

In the process of landslide susceptibility prediction (LSP) modelling, there are some problems in the model dataset relating to landslide and non-landslide samples, such as landslide sample errors, subjective randomness and low accuracy of non-landslide sample selection. In order to solve the above problems, a semi-supervised machine learning model for LSP is innovatively proposed. Firstly, Yanchang County of Shanxi Province, China, is taken as the study area. Secondly, the frequency ratio values of 12 environmental factors (elevation, slope, aspect, etc.) and the randomly selected twice non-landslides are used to form the initial model datasets. Thirdly, an extreme gradient boosting (XGBoost) model is adopted for training and testing the initial datasets, so as to produce initial landslide susceptibility maps (LSMs) which are divided into very low, low, moderate, high and very high susceptibility levels. Next, the landslide samples in initial LSMs with very low and low susceptibility levels are excluded to improve the accuracy of landslide samples, and the unlabelled twice non-landslide samples in initial LSMs with low and very low susceptibility levels are randomly selected to ensure the accuracy of non-landslide samples. These new obtained landslide and non-landslide samples are reimported into XGBoost model to construct the semi-supervised XGBoost (SSXGBoost) model. Finally, accuracy, kappa coefficient and statistical indexes of susceptibility indexes are adopted to assess the LSP performance of XGBoost and SSXGBoost models. Results show that SSXGBoost model has remarkably better LSP performance than that of XGBoost model. Conclusively, the proposed SSXGBoost model effectively overcomes the problems that the accuracy of landslide samples needs to be further improved and that non-landslide samples are difficult to select accurately.

基于半监督 XGBoost 模型的滑坡易感性预测模型
在滑坡易感性预测(LSP)建模过程中,模型数据集存在滑坡样本和非滑坡样本的一些问题,如滑坡样本误差、主观随机性和非滑坡样本选择精度低等。为了解决上述问题,本文创新性地提出了一种针对滑坡样本的半监督机器学习模型。首先,以中国山西省延长县为研究区域。其次,利用 12 个环境因素(高程、坡度、坡向等)的频率比值和随机抽取的两次非滑坡数据形成初始模型数据集。第三,采用极端梯度提升(XGBoost)模型对初始数据集进行训练和测试,从而生成初始滑坡易感性图(LSM),并将其分为极低、低、中、高和极高易感等级。然后,剔除初始易感度图中极低和极低易感度的滑坡样本,以提高滑坡样本的准确性;随机抽取初始易感度图中低易感度和极低易感度的两次非滑坡样本,以确保非滑坡样本的准确性。这些新获得的滑坡和非滑坡样本被重新导入 XGBoost 模型,以构建半监督 XGBoost(SSXGBoost)模型。最后,采用准确度、卡帕系数和易感性指标统计指数来评估 XGBoost 和 SSXGBoost 模型的 LSP 性能。结果表明,SSXGBoost 模型的 LSP 性能明显优于 XGBoost 模型。最终,所提出的 SSXGBoost 模型有效克服了滑坡样本精度有待进一步提高和非滑坡样本难以准确选择的问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Geological Journal
Geological Journal 地学-地球科学综合
CiteScore
4.20
自引率
11.10%
发文量
269
审稿时长
3 months
期刊介绍: In recent years there has been a growth of specialist journals within geological sciences. Nevertheless, there is an important role for a journal of an interdisciplinary kind. Traditionally, GEOLOGICAL JOURNAL has been such a journal and continues in its aim of promoting interest in all branches of the Geological Sciences, through publication of original research papers and review articles. The journal publishes Special Issues with a common theme or regional coverage e.g. Chinese Dinosaurs; Tectonics of the Eastern Mediterranean, Triassic basins of the Central and North Atlantic Borderlands). These are extensively cited. The Journal has a particular interest in publishing papers on regional case studies from any global locality which have conclusions of general interest. Such papers may emphasize aspects across the full spectrum of geological sciences.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信