{"title":"Landslide susceptibility prediction modelling based on semi-supervised XGBoost model","authors":"Qiangqiang Shua, Hongbin Peng, Jingkai Li","doi":"10.1002/gj.4936","DOIUrl":null,"url":null,"abstract":"<p>In the process of landslide susceptibility prediction (LSP) modelling, there are some problems in the model dataset relating to landslide and non-landslide samples, such as landslide sample errors, subjective randomness and low accuracy of non-landslide sample selection. In order to solve the above problems, a semi-supervised machine learning model for LSP is innovatively proposed. Firstly, Yanchang County of Shanxi Province, China, is taken as the study area. Secondly, the frequency ratio values of 12 environmental factors (elevation, slope, aspect, etc.) and the randomly selected twice non-landslides are used to form the initial model datasets. Thirdly, an extreme gradient boosting (XGBoost) model is adopted for training and testing the initial datasets, so as to produce initial landslide susceptibility maps (LSMs) which are divided into very low, low, moderate, high and very high susceptibility levels. Next, the landslide samples in initial LSMs with very low and low susceptibility levels are excluded to improve the accuracy of landslide samples, and the unlabelled twice non-landslide samples in initial LSMs with low and very low susceptibility levels are randomly selected to ensure the accuracy of non-landslide samples. These new obtained landslide and non-landslide samples are reimported into XGBoost model to construct the semi-supervised XGBoost (SSXGBoost) model. Finally, accuracy, kappa coefficient and statistical indexes of susceptibility indexes are adopted to assess the LSP performance of XGBoost and SSXGBoost models. Results show that SSXGBoost model has remarkably better LSP performance than that of XGBoost model. Conclusively, the proposed SSXGBoost model effectively overcomes the problems that the accuracy of landslide samples needs to be further improved and that non-landslide samples are difficult to select accurately.</p>","PeriodicalId":12784,"journal":{"name":"Geological Journal","volume":null,"pages":null},"PeriodicalIF":1.4000,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geological Journal","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/gj.4936","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
In the process of landslide susceptibility prediction (LSP) modelling, there are some problems in the model dataset relating to landslide and non-landslide samples, such as landslide sample errors, subjective randomness and low accuracy of non-landslide sample selection. In order to solve the above problems, a semi-supervised machine learning model for LSP is innovatively proposed. Firstly, Yanchang County of Shanxi Province, China, is taken as the study area. Secondly, the frequency ratio values of 12 environmental factors (elevation, slope, aspect, etc.) and the randomly selected twice non-landslides are used to form the initial model datasets. Thirdly, an extreme gradient boosting (XGBoost) model is adopted for training and testing the initial datasets, so as to produce initial landslide susceptibility maps (LSMs) which are divided into very low, low, moderate, high and very high susceptibility levels. Next, the landslide samples in initial LSMs with very low and low susceptibility levels are excluded to improve the accuracy of landslide samples, and the unlabelled twice non-landslide samples in initial LSMs with low and very low susceptibility levels are randomly selected to ensure the accuracy of non-landslide samples. These new obtained landslide and non-landslide samples are reimported into XGBoost model to construct the semi-supervised XGBoost (SSXGBoost) model. Finally, accuracy, kappa coefficient and statistical indexes of susceptibility indexes are adopted to assess the LSP performance of XGBoost and SSXGBoost models. Results show that SSXGBoost model has remarkably better LSP performance than that of XGBoost model. Conclusively, the proposed SSXGBoost model effectively overcomes the problems that the accuracy of landslide samples needs to be further improved and that non-landslide samples are difficult to select accurately.
期刊介绍:
In recent years there has been a growth of specialist journals within geological sciences. Nevertheless, there is an important role for a journal of an interdisciplinary kind. Traditionally, GEOLOGICAL JOURNAL has been such a journal and continues in its aim of promoting interest in all branches of the Geological Sciences, through publication of original research papers and review articles. The journal publishes Special Issues with a common theme or regional coverage e.g. Chinese Dinosaurs; Tectonics of the Eastern Mediterranean, Triassic basins of the Central and North Atlantic Borderlands). These are extensively cited.
The Journal has a particular interest in publishing papers on regional case studies from any global locality which have conclusions of general interest. Such papers may emphasize aspects across the full spectrum of geological sciences.