Praveen Kumar, Priyanka Priyanka, K. V. Uday, Varun Dutt
{"title":"解决土壤运动预测中的等级不平衡问题","authors":"Praveen Kumar, Priyanka Priyanka, K. V. Uday, Varun Dutt","doi":"10.5194/nhess-24-1913-2024","DOIUrl":null,"url":null,"abstract":"Abstract. Landslides threaten human life and infrastructure, resulting in fatalities and economic losses. Monitoring stations provide valuable data for predicting soil movement, which is crucial in mitigating this threat. Accurately predicting soil movement from monitoring data is challenging due to its complexity and inherent class imbalance. This study proposes developing machine learning (ML) models with oversampling techniques to address the class imbalance issue and develop a robust soil movement prediction system. The dataset, comprising 2 years (2019–2021) of monitoring data from a landslide in Uttarakhand, has a 70:30 ratio of training and testing data. To tackle the class imbalance problem, various oversampling techniques, including the synthetic minority oversampling technique (SMOTE), K-means SMOTE, borderline-SMOTE, and adaptive SMOTE (ADASYN), were applied to the training dataset. Several ML models, namely random forest (RF), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), adaptive boosting (AdaBoost), category boosting (CatBoost), long short-term memory (LSTM), multilayer perceptron (MLP), and a dynamic ensemble, were trained and compared for soil movement prediction. A 5-fold cross-validation method was applied to optimize the ML models on the training data, and the models were tested on the testing set. Among these ML models, the dynamic ensemble model with K-means SMOTE performed the best in testing, with an accuracy, precision, and recall rate of 0.995, 0.995, and 0.995, respectively, and an F1 score of 0.995. Additionally, models without oversampling exhibited poor performance in training and testing, highlighting the importance of incorporating oversampling techniques to enhance predictive capabilities.\n","PeriodicalId":508073,"journal":{"name":"Natural Hazards and Earth System Sciences","volume":"87 6","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Addressing class imbalance in soil movement predictions\",\"authors\":\"Praveen Kumar, Priyanka Priyanka, K. V. Uday, Varun Dutt\",\"doi\":\"10.5194/nhess-24-1913-2024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract. Landslides threaten human life and infrastructure, resulting in fatalities and economic losses. Monitoring stations provide valuable data for predicting soil movement, which is crucial in mitigating this threat. Accurately predicting soil movement from monitoring data is challenging due to its complexity and inherent class imbalance. This study proposes developing machine learning (ML) models with oversampling techniques to address the class imbalance issue and develop a robust soil movement prediction system. The dataset, comprising 2 years (2019–2021) of monitoring data from a landslide in Uttarakhand, has a 70:30 ratio of training and testing data. To tackle the class imbalance problem, various oversampling techniques, including the synthetic minority oversampling technique (SMOTE), K-means SMOTE, borderline-SMOTE, and adaptive SMOTE (ADASYN), were applied to the training dataset. Several ML models, namely random forest (RF), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), adaptive boosting (AdaBoost), category boosting (CatBoost), long short-term memory (LSTM), multilayer perceptron (MLP), and a dynamic ensemble, were trained and compared for soil movement prediction. A 5-fold cross-validation method was applied to optimize the ML models on the training data, and the models were tested on the testing set. Among these ML models, the dynamic ensemble model with K-means SMOTE performed the best in testing, with an accuracy, precision, and recall rate of 0.995, 0.995, and 0.995, respectively, and an F1 score of 0.995. Additionally, models without oversampling exhibited poor performance in training and testing, highlighting the importance of incorporating oversampling techniques to enhance predictive capabilities.\\n\",\"PeriodicalId\":508073,\"journal\":{\"name\":\"Natural Hazards and Earth System Sciences\",\"volume\":\"87 6\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Natural Hazards and Earth System Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5194/nhess-24-1913-2024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Hazards and Earth System Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5194/nhess-24-1913-2024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
摘要山体滑坡威胁着人类生命和基础设施,造成人员伤亡和经济损失。监测站为预测土壤移动提供了宝贵的数据,这对减轻这一威胁至关重要。由于监测数据的复杂性和固有的类不平衡,从监测数据中准确预测土壤移动具有挑战性。本研究提出利用超采样技术开发机器学习(ML)模型,以解决类不平衡问题,并开发出一种稳健的土壤移动预测系统。数据集包括来自北阿坎德邦滑坡的两年(2019-2021 年)监测数据,其中训练数据和测试数据的比例为 70:30。为解决类不平衡问题,对训练数据集采用了多种超采样技术,包括合成少数超采样技术(SMOTE)、K-means SMOTE、borderline-SMOTE 和自适应 SMOTE(ADASYN)。对随机森林(RF)、极梯度提升(XGBoost)、轻梯度提升机(LightGBM)、自适应提升(AdaBoost)、类别提升(CatBoost)、长短期记忆(LSTM)、多层感知器(MLP)和动态集合等多个 ML 模型进行了训练,并对其进行了比较。采用 5 倍交叉验证法对训练数据上的 ML 模型进行了优化,并在测试集上对模型进行了测试。在这些 ML 模型中,采用 K-means SMOTE 的动态集合模型在测试中表现最佳,准确率、精确率和召回率分别为 0.995、0.995 和 0.995,F1 分数为 0.995。此外,没有超采样的模型在训练和测试中表现不佳,这突出了采用超采样技术提高预测能力的重要性。
Addressing class imbalance in soil movement predictions
Abstract. Landslides threaten human life and infrastructure, resulting in fatalities and economic losses. Monitoring stations provide valuable data for predicting soil movement, which is crucial in mitigating this threat. Accurately predicting soil movement from monitoring data is challenging due to its complexity and inherent class imbalance. This study proposes developing machine learning (ML) models with oversampling techniques to address the class imbalance issue and develop a robust soil movement prediction system. The dataset, comprising 2 years (2019–2021) of monitoring data from a landslide in Uttarakhand, has a 70:30 ratio of training and testing data. To tackle the class imbalance problem, various oversampling techniques, including the synthetic minority oversampling technique (SMOTE), K-means SMOTE, borderline-SMOTE, and adaptive SMOTE (ADASYN), were applied to the training dataset. Several ML models, namely random forest (RF), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), adaptive boosting (AdaBoost), category boosting (CatBoost), long short-term memory (LSTM), multilayer perceptron (MLP), and a dynamic ensemble, were trained and compared for soil movement prediction. A 5-fold cross-validation method was applied to optimize the ML models on the training data, and the models were tested on the testing set. Among these ML models, the dynamic ensemble model with K-means SMOTE performed the best in testing, with an accuracy, precision, and recall rate of 0.995, 0.995, and 0.995, respectively, and an F1 score of 0.995. Additionally, models without oversampling exhibited poor performance in training and testing, highlighting the importance of incorporating oversampling techniques to enhance predictive capabilities.