{"title":"Investigating landslide data balancing for susceptibility mapping using generative and machine learning models","authors":"Yuhang Jiang, Wei Wang, Lifang Zou, Yajun Cao, Wei-Chau Xie","doi":"10.1007/s10346-024-02352-3","DOIUrl":null,"url":null,"abstract":"<p>With the development and application of machine learning, significant advances have been made in landslide susceptibility mapping. However, due to challenges in actual field landslide investigations, current landslide susceptibility mapping is usually characterized by insufficient landslide samples (positive samples) and low reliability of non-landslide samples (negative samples). Considering Lianghe County in Yunnan Province, China, as an example, this paper aims to research the effectiveness of three oversampling models in generating positive samples for landslides: Conditional Tabular Generative Adversarial Networks (CTGAN), Generative Adversarial Networks (GAN), and the traditional Synthetic Minority Oversampling Technique (SMOTE) algorithms. Additionally, three machine learning methods, including 1D Convolutional Neural Network-Long Short-Term Memory Neural Network (CNN-LSTM), Random Forest (RF), and Gradient Boosting Decision Tree (GBDT) classifiers, are used for landslide susceptibility assessment. We also devise a non-landslide data (negative samples) screening method utilizing a self-trained support vector machine within a semi-supervised framework. The results show that by training on the dataset after negative sample screening, the AUC values for the 1D-CNN-LSTM, RF, and GBDT models have shown significant improvement, increasing from (0.778, 0.869, 0.849) to (0.837, 0.936, 0.877). Compared with the original training set, the prediction accuracy of the three machine learning models is improved after training on the augmented data by CTGAN, GAN, and SMOTE models. The RF model, augmented with 200 positive samples generated by CTGAN, achieves the highest prediction accuracy in the study (AUC = 0.962). The 1D CNN-LSTM model achieves its highest prediction accuracy (AUC = 0.953) when augmented with 200 positive samples from GAN. Similarly, the GBDT model reaches its highest prediction accuracy (AUC = 0.928) when augmented with 200 positive samples created by SMOTE. In addition, the spatial distribution of data indicates that the data generated by the generative adversarial model exhibits higher diversity, which can be used for landslide susceptibility assessment.</p>","PeriodicalId":17938,"journal":{"name":"Landslides","volume":"79 1","pages":""},"PeriodicalIF":5.8000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Landslides","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s10346-024-02352-3","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, GEOLOGICAL","Score":null,"Total":0}
引用次数: 0
Abstract
With the development and application of machine learning, significant advances have been made in landslide susceptibility mapping. However, due to challenges in actual field landslide investigations, current landslide susceptibility mapping is usually characterized by insufficient landslide samples (positive samples) and low reliability of non-landslide samples (negative samples). Considering Lianghe County in Yunnan Province, China, as an example, this paper aims to research the effectiveness of three oversampling models in generating positive samples for landslides: Conditional Tabular Generative Adversarial Networks (CTGAN), Generative Adversarial Networks (GAN), and the traditional Synthetic Minority Oversampling Technique (SMOTE) algorithms. Additionally, three machine learning methods, including 1D Convolutional Neural Network-Long Short-Term Memory Neural Network (CNN-LSTM), Random Forest (RF), and Gradient Boosting Decision Tree (GBDT) classifiers, are used for landslide susceptibility assessment. We also devise a non-landslide data (negative samples) screening method utilizing a self-trained support vector machine within a semi-supervised framework. The results show that by training on the dataset after negative sample screening, the AUC values for the 1D-CNN-LSTM, RF, and GBDT models have shown significant improvement, increasing from (0.778, 0.869, 0.849) to (0.837, 0.936, 0.877). Compared with the original training set, the prediction accuracy of the three machine learning models is improved after training on the augmented data by CTGAN, GAN, and SMOTE models. The RF model, augmented with 200 positive samples generated by CTGAN, achieves the highest prediction accuracy in the study (AUC = 0.962). The 1D CNN-LSTM model achieves its highest prediction accuracy (AUC = 0.953) when augmented with 200 positive samples from GAN. Similarly, the GBDT model reaches its highest prediction accuracy (AUC = 0.928) when augmented with 200 positive samples created by SMOTE. In addition, the spatial distribution of data indicates that the data generated by the generative adversarial model exhibits higher diversity, which can be used for landslide susceptibility assessment.
期刊介绍:
Landslides are gravitational mass movements of rock, debris or earth. They may occur in conjunction with other major natural disasters such as floods, earthquakes and volcanic eruptions. Expanding urbanization and changing land-use practices have increased the incidence of landslide disasters. Landslides as catastrophic events include human injury, loss of life and economic devastation and are studied as part of the fields of earth, water and engineering sciences. The aim of the journal Landslides is to be the common platform for the publication of integrated research on landslide processes, hazards, risk analysis, mitigation, and the protection of our cultural heritage and the environment. The journal publishes research papers, news of recent landslide events and information on the activities of the International Consortium on Landslides.
- Landslide dynamics, mechanisms and processes
- Landslide risk evaluation: hazard assessment, hazard mapping, and vulnerability assessment
- Geological, Geotechnical, Hydrological and Geophysical modeling
- Effects of meteorological, hydrological and global climatic change factors
- Monitoring including remote sensing and other non-invasive systems
- New technology, expert and intelligent systems
- Application of GIS techniques
- Rock slides, rock falls, debris flows, earth flows, and lateral spreads
- Large-scale landslides, lahars and pyroclastic flows in volcanic zones
- Marine and reservoir related landslides
- Landslide related tsunamis and seiches
- Landslide disasters in urban areas and along critical infrastructure
- Landslides and natural resources
- Land development and land-use practices
- Landslide remedial measures / prevention works
- Temporal and spatial prediction of landslides
- Early warning and evacuation
- Global landslide database