{"title":"Benchmarking data handling strategies for landslide susceptibility modeling using random forest workflows","authors":"Guruh Samodra , Ngadisih , Ferman Setia Nugroho","doi":"10.1016/j.aiig.2024.100093","DOIUrl":null,"url":null,"abstract":"<div><div>Machine learning (ML) algorithms are frequently used in landslide susceptibility modeling. Different data handling strategies may generate variations in landslide susceptibility modeling, even when using the same ML algorithm. This research aims to compare the combinations of inventory data handling, cross validation (CV), and hyperparameter tuning strategies to generate landslide susceptibility maps. The results are expected to provide a general strategy for landslide susceptibility modeling using ML techniques. The authors employed eight landslide inventory data handling scenarios to convert a landslide polygon into a landslide point, i.e., the landslide point is located on the toe (minimum height), on the scarp (maximum height), at the center of the landslide, randomly inside the polygon (1 point), randomly inside the polygon (3 points), randomly inside the polygon (5 points), randomly inside the polygon (10 points), and 15 m grid sampling. Random forest models using CV–nonspatial hyperparameter tuning, spatial CV–spatial hyperparameter tuning, and spatial CV–forward feature selection–no hyperparameter tuning were applied for each data handling strategy. The combination generated 24 random forest ML workflows, which are applied using a complete inventory of 743 landslides triggered by Tropical Cyclone Cempaka (2017) in Pacitan Regency, Indonesia, and 11 landslide controlling factors. The results show that grid sampling with spatial CV and spatial hyperparameter tuning is favorable because the strategy can minimize overfitting, generate a relatively high-performance predictive model, and reduce the appearance of susceptibility artifacts in the landslide area. Careful data inventory handling, CV, and hyperparameter tuning strategies should be considered in landslide susceptibility modeling to increase the applicability of landslide susceptibility maps in practical application.</div></div>","PeriodicalId":100124,"journal":{"name":"Artificial Intelligence in Geosciences","volume":"5 ","pages":"Article 100093"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Geosciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666544124000340","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning (ML) algorithms are frequently used in landslide susceptibility modeling. Different data handling strategies may generate variations in landslide susceptibility modeling, even when using the same ML algorithm. This research aims to compare the combinations of inventory data handling, cross validation (CV), and hyperparameter tuning strategies to generate landslide susceptibility maps. The results are expected to provide a general strategy for landslide susceptibility modeling using ML techniques. The authors employed eight landslide inventory data handling scenarios to convert a landslide polygon into a landslide point, i.e., the landslide point is located on the toe (minimum height), on the scarp (maximum height), at the center of the landslide, randomly inside the polygon (1 point), randomly inside the polygon (3 points), randomly inside the polygon (5 points), randomly inside the polygon (10 points), and 15 m grid sampling. Random forest models using CV–nonspatial hyperparameter tuning, spatial CV–spatial hyperparameter tuning, and spatial CV–forward feature selection–no hyperparameter tuning were applied for each data handling strategy. The combination generated 24 random forest ML workflows, which are applied using a complete inventory of 743 landslides triggered by Tropical Cyclone Cempaka (2017) in Pacitan Regency, Indonesia, and 11 landslide controlling factors. The results show that grid sampling with spatial CV and spatial hyperparameter tuning is favorable because the strategy can minimize overfitting, generate a relatively high-performance predictive model, and reduce the appearance of susceptibility artifacts in the landslide area. Careful data inventory handling, CV, and hyperparameter tuning strategies should be considered in landslide susceptibility modeling to increase the applicability of landslide susceptibility maps in practical application.