Landslide susceptibility assessment in Eastern Himalayas, India: a comprehensive exploration of four novel hybrid ensemble data driven techniques integrating explainable artificial intelligence approach
{"title":"Landslide susceptibility assessment in Eastern Himalayas, India: a comprehensive exploration of four novel hybrid ensemble data driven techniques integrating explainable artificial intelligence approach","authors":"Sumon Dey, Swarup Das, Sujit Kumar Roy","doi":"10.1007/s12665-024-11945-z","DOIUrl":null,"url":null,"abstract":"<div><p>In the field of landslide susceptibility, the utilization of data driven methodologies has seen a significant breakthrough. However, the performance of the models depends on the geo-environmental factors, and the selection of factors vary from one location to another, and this leads to a persistent lacuna for the present exploration. This study was aimed to assess landslide susceptibility for Darjeeling hills in Eastern Himalayan region with sixteen causative geo-environmental factors. The selection of causal factors was performed through a two-stage procedure, namely Pearson’s correlation coefficient (PCC) and Boruta algorithm (PCC-BA). The dataset associated with the research was split randomly into 70:30 ratio for train and test data. In addition, 30% of the training data was taken as validation dataset. Four advanced data-driven models namely K-nearest neighbour (KNN), Boosted Tree (BT), Gradient Boosting Machines (GBM) and ensembled Neural Network with Principal Component Analysis (PCA-NN) were taken up and four advanced novel ensembles namely KNN-BT, PCA-NN-BT, GBM-KNN and GBM-PCA-NN were constructed. The susceptibility maps were grouped into five divisions, viz., very low (VL), low (L), medium (M), high (H), and very high (VH) susceptibility. Through area under receiver operation characteristics curve, the accomplishment of constructed susceptibility models was substantiated with training, testing and validation dataset, where KNN-BT attained 0.943, 0.889 and 0.944 respectively, PCA-NN-BT attained 0.934, 0.876 and 0.943 respectively; GBM-KNN attained 0.959, 0.897 and 0.957 respectively; and GBM-PCA-NN attained 0.956, 0.889 and 0.962 respectively. The researchers have utilized an extensive explainable artificial intelligence (ex-AI) method, partial dependence profile (PDP) to quantify the effect of causal factors on all the four ensembled models. The study was aimed to demonstrate a significant capacity to substantially optimize disaster mitigation policies with a constituent endeavour to bridge the chasm between contemporary machine learning approaches and geo-spatial applications, and thereby paving the way to enhance the resilience of inhabitants in landslide prone areas of hilly portion of Darjeeling district.</p></div>","PeriodicalId":542,"journal":{"name":"Environmental Earth Sciences","volume":"83 22","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Earth Sciences","FirstCategoryId":"93","ListUrlMain":"https://link.springer.com/article/10.1007/s12665-024-11945-z","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
In the field of landslide susceptibility, the utilization of data driven methodologies has seen a significant breakthrough. However, the performance of the models depends on the geo-environmental factors, and the selection of factors vary from one location to another, and this leads to a persistent lacuna for the present exploration. This study was aimed to assess landslide susceptibility for Darjeeling hills in Eastern Himalayan region with sixteen causative geo-environmental factors. The selection of causal factors was performed through a two-stage procedure, namely Pearson’s correlation coefficient (PCC) and Boruta algorithm (PCC-BA). The dataset associated with the research was split randomly into 70:30 ratio for train and test data. In addition, 30% of the training data was taken as validation dataset. Four advanced data-driven models namely K-nearest neighbour (KNN), Boosted Tree (BT), Gradient Boosting Machines (GBM) and ensembled Neural Network with Principal Component Analysis (PCA-NN) were taken up and four advanced novel ensembles namely KNN-BT, PCA-NN-BT, GBM-KNN and GBM-PCA-NN were constructed. The susceptibility maps were grouped into five divisions, viz., very low (VL), low (L), medium (M), high (H), and very high (VH) susceptibility. Through area under receiver operation characteristics curve, the accomplishment of constructed susceptibility models was substantiated with training, testing and validation dataset, where KNN-BT attained 0.943, 0.889 and 0.944 respectively, PCA-NN-BT attained 0.934, 0.876 and 0.943 respectively; GBM-KNN attained 0.959, 0.897 and 0.957 respectively; and GBM-PCA-NN attained 0.956, 0.889 and 0.962 respectively. The researchers have utilized an extensive explainable artificial intelligence (ex-AI) method, partial dependence profile (PDP) to quantify the effect of causal factors on all the four ensembled models. The study was aimed to demonstrate a significant capacity to substantially optimize disaster mitigation policies with a constituent endeavour to bridge the chasm between contemporary machine learning approaches and geo-spatial applications, and thereby paving the way to enhance the resilience of inhabitants in landslide prone areas of hilly portion of Darjeeling district.
期刊介绍:
Environmental Earth Sciences is an international multidisciplinary journal concerned with all aspects of interaction between humans, natural resources, ecosystems, special climates or unique geographic zones, and the earth:
Water and soil contamination caused by waste management and disposal practices
Environmental problems associated with transportation by land, air, or water
Geological processes that may impact biosystems or humans
Man-made or naturally occurring geological or hydrological hazards
Environmental problems associated with the recovery of materials from the earth
Environmental problems caused by extraction of minerals, coal, and ores, as well as oil and gas, water and alternative energy sources
Environmental impacts of exploration and recultivation – Environmental impacts of hazardous materials
Management of environmental data and information in data banks and information systems
Dissemination of knowledge on techniques, methods, approaches and experiences to improve and remediate the environment
In pursuit of these topics, the geoscientific disciplines are invited to contribute their knowledge and experience. Major disciplines include: hydrogeology, hydrochemistry, geochemistry, geophysics, engineering geology, remediation science, natural resources management, environmental climatology and biota, environmental geography, soil science and geomicrobiology.