Impact of Extreme Class-Imbalance on Landslide-Risk Prediction and Mitigation Using Two-Stage Deep Neural Network

2022 14th International Conference on Signal Processing Systems (ICSPS) Pub Date : 2022-11-01 DOI:10.1109/ICSPS58776.2022.00131

N. Tengtrairat, W. Woo, P. Parathai, C. Sundaranaga, T. Kridakorn, N. Ayutthaya, D. Rinchumphu

{"title":"Impact of Extreme Class-Imbalance on Landslide-Risk Prediction and Mitigation Using Two-Stage Deep Neural Network","authors":"N. Tengtrairat, W. Woo, P. Parathai, C. Sundaranaga, T. Kridakorn, N. Ayutthaya, D. Rinchumphu","doi":"10.1109/ICSPS58776.2022.00131","DOIUrl":null,"url":null,"abstract":"The classification of landslides is the one of the most challenging topics because of the complexity of the relationships of various dynamic and uncertain factors and the physical gaining data processes. Landslide incidents frequently occur in the upper northern region of Thailand due to its topography. The landslide classification method is proposed to capture the significant features from an extreme case of class-imbalance dataset. The imbalanced nature of some real-world data is one of the current challenges for machine learning researchers. A study area of this research covers an area of 25 square kilometers at Chiang Rai, in Thailand which contains 30,408 non-landslide data and 1,077 landslide data. The percentage of landslide data is 3.54% of the total data. This paper proposed the solution of landslide classification given by extreme class-imbalance dataset. The proposed method has two main steps i.e., firstly, mitigating class-imbalance dataset. and secondly two-stage learning. Performance of the proposed method benchmarks the baseline, the logistic regression (LR), the random forest classifier (RFC) methods given by enhanced dataset. In the case of imbalance dataset, the one-class method is assessed against the proposed method along with the LR and the RFC methods. Experimental results demonstrate that the proposed method has improved the landslide-risk prediction performance over the baseline, the LR, the RFC and the one-class SVM methods in terms of an average area under the curve scores by 0.48, 0.48, 0.03, and 0.06, respectively, in both enhanced dataset and imbalance dataset.","PeriodicalId":330562,"journal":{"name":"2022 14th International Conference on Signal Processing Systems (ICSPS)","volume":"145 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Signal Processing Systems (ICSPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSPS58776.2022.00131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The classification of landslides is the one of the most challenging topics because of the complexity of the relationships of various dynamic and uncertain factors and the physical gaining data processes. Landslide incidents frequently occur in the upper northern region of Thailand due to its topography. The landslide classification method is proposed to capture the significant features from an extreme case of class-imbalance dataset. The imbalanced nature of some real-world data is one of the current challenges for machine learning researchers. A study area of this research covers an area of 25 square kilometers at Chiang Rai, in Thailand which contains 30,408 non-landslide data and 1,077 landslide data. The percentage of landslide data is 3.54% of the total data. This paper proposed the solution of landslide classification given by extreme class-imbalance dataset. The proposed method has two main steps i.e., firstly, mitigating class-imbalance dataset. and secondly two-stage learning. Performance of the proposed method benchmarks the baseline, the logistic regression (LR), the random forest classifier (RFC) methods given by enhanced dataset. In the case of imbalance dataset, the one-class method is assessed against the proposed method along with the LR and the RFC methods. Experimental results demonstrate that the proposed method has improved the landslide-risk prediction performance over the baseline, the LR, the RFC and the one-class SVM methods in terms of an average area under the curve scores by 0.48, 0.48, 0.03, and 0.06, respectively, in both enhanced dataset and imbalance dataset.

查看原文本刊更多论文

极端类不平衡对两阶段深度神经网络滑坡风险预测与缓解的影响

由于各种动态和不确定因素的关系以及物理获取数据过程的复杂性，滑坡分类是最具挑战性的课题之一。由于泰国北部地区的地形，山体滑坡事件经常发生。提出了滑坡分类方法，从一个极端情况下的类不平衡数据集中捕捉显著特征。一些真实世界数据的不平衡性是当前机器学习研究人员面临的挑战之一。本研究在泰国清莱选取了25平方公里的研究区域，其中包含30,408个非滑坡数据和1,077个滑坡数据。滑坡资料占总资料的3.54%。本文提出了用极端类不平衡数据集求解滑坡分类的方法。该方法主要分为两个步骤:首先，减轻类不平衡数据集。第二，两阶段学习。该方法的性能基准测试了增强数据集给出的基线、逻辑回归(LR)、随机森林分类器(RFC)方法。在不平衡数据集的情况下，将单类方法与LR和RFC方法一起对所提出的方法进行评估。实验结果表明，该方法在增强数据集和不平衡数据集的滑坡风险预测性能分别比基线、LR、RFC和一类SVM方法的曲线下平均面积得分提高了0.48、0.48、0.03和0.06。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 14th International Conference on Signal Processing Systems (ICSPS)

自引率

0.00%

发文量