Investigating landslide data balancing for susceptibility mapping using generative and machine learning models

IF 5.8 2区 工程技术 Q1 ENGINEERING, GEOLOGICAL
Yuhang Jiang, Wei Wang, Lifang Zou, Yajun Cao, Wei-Chau Xie
{"title":"Investigating landslide data balancing for susceptibility mapping using generative and machine learning models","authors":"Yuhang Jiang, Wei Wang, Lifang Zou, Yajun Cao, Wei-Chau Xie","doi":"10.1007/s10346-024-02352-3","DOIUrl":null,"url":null,"abstract":"<p>With the development and application of machine learning, significant advances have been made in landslide susceptibility mapping. However, due to challenges in actual field landslide investigations, current landslide susceptibility mapping is usually characterized by insufficient landslide samples (positive samples) and low reliability of non-landslide samples (negative samples). Considering Lianghe County in Yunnan Province, China, as an example, this paper aims to research the effectiveness of three oversampling models in generating positive samples for landslides: Conditional Tabular Generative Adversarial Networks (CTGAN), Generative Adversarial Networks (GAN), and the traditional Synthetic Minority Oversampling Technique (SMOTE) algorithms. Additionally, three machine learning methods, including 1D Convolutional Neural Network-Long Short-Term Memory Neural Network (CNN-LSTM), Random Forest (RF), and Gradient Boosting Decision Tree (GBDT) classifiers, are used for landslide susceptibility assessment. We also devise a non-landslide data (negative samples) screening method utilizing a self-trained support vector machine within a semi-supervised framework. The results show that by training on the dataset after negative sample screening, the AUC values for the 1D-CNN-LSTM, RF, and GBDT models have shown significant improvement, increasing from (0.778, 0.869, 0.849) to (0.837, 0.936, 0.877). Compared with the original training set, the prediction accuracy of the three machine learning models is improved after training on the augmented data by CTGAN, GAN, and SMOTE models. The RF model, augmented with 200 positive samples generated by CTGAN, achieves the highest prediction accuracy in the study (AUC = 0.962). The 1D CNN-LSTM model achieves its highest prediction accuracy (AUC = 0.953) when augmented with 200 positive samples from GAN. Similarly, the GBDT model reaches its highest prediction accuracy (AUC = 0.928) when augmented with 200 positive samples created by SMOTE. In addition, the spatial distribution of data indicates that the data generated by the generative adversarial model exhibits higher diversity, which can be used for landslide susceptibility assessment.</p>","PeriodicalId":17938,"journal":{"name":"Landslides","volume":"79 1","pages":""},"PeriodicalIF":5.8000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Landslides","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s10346-024-02352-3","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, GEOLOGICAL","Score":null,"Total":0}
引用次数: 0

Abstract

With the development and application of machine learning, significant advances have been made in landslide susceptibility mapping. However, due to challenges in actual field landslide investigations, current landslide susceptibility mapping is usually characterized by insufficient landslide samples (positive samples) and low reliability of non-landslide samples (negative samples). Considering Lianghe County in Yunnan Province, China, as an example, this paper aims to research the effectiveness of three oversampling models in generating positive samples for landslides: Conditional Tabular Generative Adversarial Networks (CTGAN), Generative Adversarial Networks (GAN), and the traditional Synthetic Minority Oversampling Technique (SMOTE) algorithms. Additionally, three machine learning methods, including 1D Convolutional Neural Network-Long Short-Term Memory Neural Network (CNN-LSTM), Random Forest (RF), and Gradient Boosting Decision Tree (GBDT) classifiers, are used for landslide susceptibility assessment. We also devise a non-landslide data (negative samples) screening method utilizing a self-trained support vector machine within a semi-supervised framework. The results show that by training on the dataset after negative sample screening, the AUC values for the 1D-CNN-LSTM, RF, and GBDT models have shown significant improvement, increasing from (0.778, 0.869, 0.849) to (0.837, 0.936, 0.877). Compared with the original training set, the prediction accuracy of the three machine learning models is improved after training on the augmented data by CTGAN, GAN, and SMOTE models. The RF model, augmented with 200 positive samples generated by CTGAN, achieves the highest prediction accuracy in the study (AUC = 0.962). The 1D CNN-LSTM model achieves its highest prediction accuracy (AUC = 0.953) when augmented with 200 positive samples from GAN. Similarly, the GBDT model reaches its highest prediction accuracy (AUC = 0.928) when augmented with 200 positive samples created by SMOTE. In addition, the spatial distribution of data indicates that the data generated by the generative adversarial model exhibits higher diversity, which can be used for landslide susceptibility assessment.

Abstract Image

利用生成模型和机器学习模型研究用于绘制易感性地图的滑坡数据平衡问题
随着机器学习的发展和应用,滑坡易感性绘图取得了重大进展。然而,由于野外滑坡实际调查的挑战,目前的滑坡易感性绘图通常存在滑坡样本(正样本)不足和非滑坡样本(负样本)可靠性低的问题。本文以中国云南省梁河县为例,旨在研究三种超采样模型在生成滑坡正样本方面的有效性:条件表生成对抗网络(Conditional Tabular Generative Adversarial Networks,CTGAN)、生成对抗网络(Generative Adversarial Networks,GAN)和传统的少数群体合成超采样技术(Synthetic Minority Oversampling Technique,SMOTE)算法。此外,我们还采用了三种机器学习方法,包括一维卷积神经网络-长短期记忆神经网络(CNN-LSTM)、随机森林(RF)和梯度提升决策树(GBDT)分类器,用于滑坡易感性评估。我们还设计了一种非滑坡数据(负样本)筛选方法,利用半监督框架内的自训练支持向量机。结果表明,通过在负样本筛选后的数据集上进行训练,1D-CNN-LSTM、RF 和 GBDT 模型的 AUC 值有了显著提高,从(0.778、0.869、0.849)提高到(0.837、0.936、0.877)。与原始训练集相比,通过 CTGAN、GAN 和 SMOTE 模型对增强数据进行训练后,三种机器学习模型的预测准确率都有所提高。用 CTGAN 生成的 200 个正样本增强的 RF 模型达到了研究中最高的预测准确率(AUC = 0.962)。一维 CNN-LSTM 模型在增强了 GAN 生成的 200 个正样本后,预测准确率达到最高(AUC = 0.953)。同样,当使用 SMOTE 创建的 200 个正样本时,GBDT 模型也达到了最高预测准确率(AUC = 0.928)。此外,数据的空间分布表明,生成式对抗模型生成的数据具有更高的多样性,可用于滑坡易感性评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Landslides
Landslides 地学-地球科学综合
CiteScore
13.60
自引率
14.90%
发文量
191
审稿时长
>12 weeks
期刊介绍: Landslides are gravitational mass movements of rock, debris or earth. They may occur in conjunction with other major natural disasters such as floods, earthquakes and volcanic eruptions. Expanding urbanization and changing land-use practices have increased the incidence of landslide disasters. Landslides as catastrophic events include human injury, loss of life and economic devastation and are studied as part of the fields of earth, water and engineering sciences. The aim of the journal Landslides is to be the common platform for the publication of integrated research on landslide processes, hazards, risk analysis, mitigation, and the protection of our cultural heritage and the environment. The journal publishes research papers, news of recent landslide events and information on the activities of the International Consortium on Landslides. - Landslide dynamics, mechanisms and processes - Landslide risk evaluation: hazard assessment, hazard mapping, and vulnerability assessment - Geological, Geotechnical, Hydrological and Geophysical modeling - Effects of meteorological, hydrological and global climatic change factors - Monitoring including remote sensing and other non-invasive systems - New technology, expert and intelligent systems - Application of GIS techniques - Rock slides, rock falls, debris flows, earth flows, and lateral spreads - Large-scale landslides, lahars and pyroclastic flows in volcanic zones - Marine and reservoir related landslides - Landslide related tsunamis and seiches - Landslide disasters in urban areas and along critical infrastructure - Landslides and natural resources - Land development and land-use practices - Landslide remedial measures / prevention works - Temporal and spatial prediction of landslides - Early warning and evacuation - Global landslide database
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信