Junjie Jiang, Qizhi Wang, Shihao Luan, Minghui Gao, Huijie Liang, Jun Zheng, Wei Yuan, Xiaolei Ji
{"title":"基于遗传算法优化机器学习模型的太行山区滑坡易发性预测与绘图","authors":"Junjie Jiang, Qizhi Wang, Shihao Luan, Minghui Gao, Huijie Liang, Jun Zheng, Wei Yuan, Xiaolei Ji","doi":"10.1007/s12145-024-01470-9","DOIUrl":null,"url":null,"abstract":"<p>The Taihang Mountains in China span numerous cities, where landslide disasters occur frequently in the mountainous areas, jeopardizing the lives and properties of residents. Consequently, it is of great significance to focus on prevention and control of landslide disasters in the region. Currently, a single model is commonly employed to analyze landslide susceptibility mapping (LSM), but the accuracy of the results fails to meet the demands of early warning, prevention, and control. This paper focuses on the Taihang Mountain area as the research area, organizes the collection of landslide disaster potential points and related influence factor data, and employs the information quantity method to derive a composite machine learning model by coupling with Random Forest (RF) and Extreme Gradient Boosting (XGB), subsequently utilizing the Genetic Optimization Algorithm (GA) to optimize the model. The performance of the composite model is enhanced using the Genetic Algorithm (GA), employing accuracy, regression rate, precision, F1 score, AUC value, and Taylor diagram to evaluate the comprehensive accuracy of the model results, with a susceptibility map generated for comparative analysis. The results demonstrate that the IV-GA-RF model performs optimally (accuracy = 0.956, precision = 0.96, recall = 0.953, F1 score = 0.957, AUC = 0.946 for the testing set, AUC = 0.929 for the training set), with all-around improvement in performance metrics compared to the unoptimized composite model, with metric values improving by 0.044, 0.051, 0.046, 0.044, 0.021 and 0.020, respectively. The IV-GA-RF model exhibits a significant advantage over the IV-GA-XGB algorithm, also optimized using the GA algorithm. The accuracy of the susceptibility map produced by the IV-GA-RF model is superior, as assessed by the Seed Cell Area Index (SCAI) method. The four factors of slope, rainfall, seismicity, and stratigraphic lithology are crucial in determining the occurrence of landslides in the study area. In summary, the IV-GA-RF model can be utilized as an effective model for analyzing landslide disasters, providing a reference for research in this field and contributing scientific insights to disaster prevention and control efforts in the study area; simultaneously, the concept of the composite optimization model introduces new perspectives into this field.</p>","PeriodicalId":49318,"journal":{"name":"Earth Science Informatics","volume":"32 1","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Landslide susceptibility prediction and mapping in Taihang mountainous area based on optimized machine learning model with genetic algorithm\",\"authors\":\"Junjie Jiang, Qizhi Wang, Shihao Luan, Minghui Gao, Huijie Liang, Jun Zheng, Wei Yuan, Xiaolei Ji\",\"doi\":\"10.1007/s12145-024-01470-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The Taihang Mountains in China span numerous cities, where landslide disasters occur frequently in the mountainous areas, jeopardizing the lives and properties of residents. Consequently, it is of great significance to focus on prevention and control of landslide disasters in the region. Currently, a single model is commonly employed to analyze landslide susceptibility mapping (LSM), but the accuracy of the results fails to meet the demands of early warning, prevention, and control. This paper focuses on the Taihang Mountain area as the research area, organizes the collection of landslide disaster potential points and related influence factor data, and employs the information quantity method to derive a composite machine learning model by coupling with Random Forest (RF) and Extreme Gradient Boosting (XGB), subsequently utilizing the Genetic Optimization Algorithm (GA) to optimize the model. The performance of the composite model is enhanced using the Genetic Algorithm (GA), employing accuracy, regression rate, precision, F1 score, AUC value, and Taylor diagram to evaluate the comprehensive accuracy of the model results, with a susceptibility map generated for comparative analysis. The results demonstrate that the IV-GA-RF model performs optimally (accuracy = 0.956, precision = 0.96, recall = 0.953, F1 score = 0.957, AUC = 0.946 for the testing set, AUC = 0.929 for the training set), with all-around improvement in performance metrics compared to the unoptimized composite model, with metric values improving by 0.044, 0.051, 0.046, 0.044, 0.021 and 0.020, respectively. The IV-GA-RF model exhibits a significant advantage over the IV-GA-XGB algorithm, also optimized using the GA algorithm. The accuracy of the susceptibility map produced by the IV-GA-RF model is superior, as assessed by the Seed Cell Area Index (SCAI) method. The four factors of slope, rainfall, seismicity, and stratigraphic lithology are crucial in determining the occurrence of landslides in the study area. In summary, the IV-GA-RF model can be utilized as an effective model for analyzing landslide disasters, providing a reference for research in this field and contributing scientific insights to disaster prevention and control efforts in the study area; simultaneously, the concept of the composite optimization model introduces new perspectives into this field.</p>\",\"PeriodicalId\":49318,\"journal\":{\"name\":\"Earth Science Informatics\",\"volume\":\"32 1\",\"pages\":\"\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Earth Science Informatics\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.1007/s12145-024-01470-9\",\"RegionNum\":4,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earth Science Informatics","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s12145-024-01470-9","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Landslide susceptibility prediction and mapping in Taihang mountainous area based on optimized machine learning model with genetic algorithm
The Taihang Mountains in China span numerous cities, where landslide disasters occur frequently in the mountainous areas, jeopardizing the lives and properties of residents. Consequently, it is of great significance to focus on prevention and control of landslide disasters in the region. Currently, a single model is commonly employed to analyze landslide susceptibility mapping (LSM), but the accuracy of the results fails to meet the demands of early warning, prevention, and control. This paper focuses on the Taihang Mountain area as the research area, organizes the collection of landslide disaster potential points and related influence factor data, and employs the information quantity method to derive a composite machine learning model by coupling with Random Forest (RF) and Extreme Gradient Boosting (XGB), subsequently utilizing the Genetic Optimization Algorithm (GA) to optimize the model. The performance of the composite model is enhanced using the Genetic Algorithm (GA), employing accuracy, regression rate, precision, F1 score, AUC value, and Taylor diagram to evaluate the comprehensive accuracy of the model results, with a susceptibility map generated for comparative analysis. The results demonstrate that the IV-GA-RF model performs optimally (accuracy = 0.956, precision = 0.96, recall = 0.953, F1 score = 0.957, AUC = 0.946 for the testing set, AUC = 0.929 for the training set), with all-around improvement in performance metrics compared to the unoptimized composite model, with metric values improving by 0.044, 0.051, 0.046, 0.044, 0.021 and 0.020, respectively. The IV-GA-RF model exhibits a significant advantage over the IV-GA-XGB algorithm, also optimized using the GA algorithm. The accuracy of the susceptibility map produced by the IV-GA-RF model is superior, as assessed by the Seed Cell Area Index (SCAI) method. The four factors of slope, rainfall, seismicity, and stratigraphic lithology are crucial in determining the occurrence of landslides in the study area. In summary, the IV-GA-RF model can be utilized as an effective model for analyzing landslide disasters, providing a reference for research in this field and contributing scientific insights to disaster prevention and control efforts in the study area; simultaneously, the concept of the composite optimization model introduces new perspectives into this field.
期刊介绍:
The Earth Science Informatics [ESIN] journal aims at rapid publication of high-quality, current, cutting-edge, and provocative scientific work in the area of Earth Science Informatics as it relates to Earth systems science and space science. This includes articles on the application of formal and computational methods, computational Earth science, spatial and temporal analyses, and all aspects of computer applications to the acquisition, storage, processing, interchange, and visualization of data and information about the materials, properties, processes, features, and phenomena that occur at all scales and locations in the Earth system’s five components (atmosphere, hydrosphere, geosphere, biosphere, cryosphere) and in space (see "About this journal" for more detail). The quarterly journal publishes research, methodology, and software articles, as well as editorials, comments, and book and software reviews. Review articles of relevant findings, topics, and methodologies are also considered.