基于遗传算法优化机器学习模型的太行山区滑坡易发性预测与绘图

IF 2.7 4区 地球科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Junjie Jiang, Qizhi Wang, Shihao Luan, Minghui Gao, Huijie Liang, Jun Zheng, Wei Yuan, Xiaolei Ji
{"title":"基于遗传算法优化机器学习模型的太行山区滑坡易发性预测与绘图","authors":"Junjie Jiang, Qizhi Wang, Shihao Luan, Minghui Gao, Huijie Liang, Jun Zheng, Wei Yuan, Xiaolei Ji","doi":"10.1007/s12145-024-01470-9","DOIUrl":null,"url":null,"abstract":"<p>The Taihang Mountains in China span numerous cities, where landslide disasters occur frequently in the mountainous areas, jeopardizing the lives and properties of residents. Consequently, it is of great significance to focus on prevention and control of landslide disasters in the region. Currently, a single model is commonly employed to analyze landslide susceptibility mapping (LSM), but the accuracy of the results fails to meet the demands of early warning, prevention, and control. This paper focuses on the Taihang Mountain area as the research area, organizes the collection of landslide disaster potential points and related influence factor data, and employs the information quantity method to derive a composite machine learning model by coupling with Random Forest (RF) and Extreme Gradient Boosting (XGB), subsequently utilizing the Genetic Optimization Algorithm (GA) to optimize the model. The performance of the composite model is enhanced using the Genetic Algorithm (GA), employing accuracy, regression rate, precision, F1 score, AUC value, and Taylor diagram to evaluate the comprehensive accuracy of the model results, with a susceptibility map generated for comparative analysis. The results demonstrate that the IV-GA-RF model performs optimally (accuracy = 0.956, precision = 0.96, recall = 0.953, F1 score = 0.957, AUC = 0.946 for the testing set, AUC = 0.929 for the training set), with all-around improvement in performance metrics compared to the unoptimized composite model, with metric values improving by 0.044, 0.051, 0.046, 0.044, 0.021 and 0.020, respectively. The IV-GA-RF model exhibits a significant advantage over the IV-GA-XGB algorithm, also optimized using the GA algorithm. The accuracy of the susceptibility map produced by the IV-GA-RF model is superior, as assessed by the Seed Cell Area Index (SCAI) method. The four factors of slope, rainfall, seismicity, and stratigraphic lithology are crucial in determining the occurrence of landslides in the study area. In summary, the IV-GA-RF model can be utilized as an effective model for analyzing landslide disasters, providing a reference for research in this field and contributing scientific insights to disaster prevention and control efforts in the study area; simultaneously, the concept of the composite optimization model introduces new perspectives into this field.</p>","PeriodicalId":49318,"journal":{"name":"Earth Science Informatics","volume":null,"pages":null},"PeriodicalIF":2.7000,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Landslide susceptibility prediction and mapping in Taihang mountainous area based on optimized machine learning model with genetic algorithm\",\"authors\":\"Junjie Jiang, Qizhi Wang, Shihao Luan, Minghui Gao, Huijie Liang, Jun Zheng, Wei Yuan, Xiaolei Ji\",\"doi\":\"10.1007/s12145-024-01470-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The Taihang Mountains in China span numerous cities, where landslide disasters occur frequently in the mountainous areas, jeopardizing the lives and properties of residents. Consequently, it is of great significance to focus on prevention and control of landslide disasters in the region. Currently, a single model is commonly employed to analyze landslide susceptibility mapping (LSM), but the accuracy of the results fails to meet the demands of early warning, prevention, and control. This paper focuses on the Taihang Mountain area as the research area, organizes the collection of landslide disaster potential points and related influence factor data, and employs the information quantity method to derive a composite machine learning model by coupling with Random Forest (RF) and Extreme Gradient Boosting (XGB), subsequently utilizing the Genetic Optimization Algorithm (GA) to optimize the model. The performance of the composite model is enhanced using the Genetic Algorithm (GA), employing accuracy, regression rate, precision, F1 score, AUC value, and Taylor diagram to evaluate the comprehensive accuracy of the model results, with a susceptibility map generated for comparative analysis. The results demonstrate that the IV-GA-RF model performs optimally (accuracy = 0.956, precision = 0.96, recall = 0.953, F1 score = 0.957, AUC = 0.946 for the testing set, AUC = 0.929 for the training set), with all-around improvement in performance metrics compared to the unoptimized composite model, with metric values improving by 0.044, 0.051, 0.046, 0.044, 0.021 and 0.020, respectively. The IV-GA-RF model exhibits a significant advantage over the IV-GA-XGB algorithm, also optimized using the GA algorithm. The accuracy of the susceptibility map produced by the IV-GA-RF model is superior, as assessed by the Seed Cell Area Index (SCAI) method. The four factors of slope, rainfall, seismicity, and stratigraphic lithology are crucial in determining the occurrence of landslides in the study area. In summary, the IV-GA-RF model can be utilized as an effective model for analyzing landslide disasters, providing a reference for research in this field and contributing scientific insights to disaster prevention and control efforts in the study area; simultaneously, the concept of the composite optimization model introduces new perspectives into this field.</p>\",\"PeriodicalId\":49318,\"journal\":{\"name\":\"Earth Science Informatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Earth Science Informatics\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.1007/s12145-024-01470-9\",\"RegionNum\":4,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earth Science Informatics","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s12145-024-01470-9","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

中国太行山横跨众多城市,山区滑坡灾害频发,危及居民生命财产安全。因此,重视该地区滑坡灾害的防治意义重大。目前,滑坡易发性绘图(LSM)通常采用单一模型进行分析,但其结果的准确性无法满足预警、预防和控制的需求。本文以太行山区为研究区域,组织收集滑坡灾害隐患点及相关影响因子数据,采用信息量法,通过与随机森林(RF)和极端梯度提升(XGB)耦合,推导出复合机器学习模型,并利用遗传优化算法(GA)对模型进行优化。利用遗传算法(GA)提高了复合模型的性能,采用准确率、回归率、精确度、F1 分数、AUC 值和泰勒图评估模型结果的综合准确性,并生成易感图进行比较分析。结果表明,IV-GA-RF 模型性能最优(准确率 = 0.956、精确率 = 0.96、召回率 = 0.953、F1 分数 = 0.957、AUC = 0.946(测试集)、AUC = 0.929(训练集)),与未优化的复合模型相比,性能指标得到全面改善,指标值分别提高了 0.044、0.051、0.046、0.044、0.021 和 0.020。与同样使用 GA 算法优化的 IV-GA-XGB 算法相比,IV-GA-RF 模型具有显著优势。根据种子细胞面积指数法(SCAI)的评估,IV-GA-RF 模型绘制的易感性图的精度更高。坡度、降雨、地震和地层岩性这四个因素是决定研究区域滑坡发生的关键。综上所述,IV-GA-RF 模型可作为滑坡灾害分析的有效模型,为该领域的研究提供参考,并为研究区的灾害防治工作提供科学依据;同时,复合优化模型的概念为该领域引入了新的视角。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Landslide susceptibility prediction and mapping in Taihang mountainous area based on optimized machine learning model with genetic algorithm

Landslide susceptibility prediction and mapping in Taihang mountainous area based on optimized machine learning model with genetic algorithm

The Taihang Mountains in China span numerous cities, where landslide disasters occur frequently in the mountainous areas, jeopardizing the lives and properties of residents. Consequently, it is of great significance to focus on prevention and control of landslide disasters in the region. Currently, a single model is commonly employed to analyze landslide susceptibility mapping (LSM), but the accuracy of the results fails to meet the demands of early warning, prevention, and control. This paper focuses on the Taihang Mountain area as the research area, organizes the collection of landslide disaster potential points and related influence factor data, and employs the information quantity method to derive a composite machine learning model by coupling with Random Forest (RF) and Extreme Gradient Boosting (XGB), subsequently utilizing the Genetic Optimization Algorithm (GA) to optimize the model. The performance of the composite model is enhanced using the Genetic Algorithm (GA), employing accuracy, regression rate, precision, F1 score, AUC value, and Taylor diagram to evaluate the comprehensive accuracy of the model results, with a susceptibility map generated for comparative analysis. The results demonstrate that the IV-GA-RF model performs optimally (accuracy = 0.956, precision = 0.96, recall = 0.953, F1 score = 0.957, AUC = 0.946 for the testing set, AUC = 0.929 for the training set), with all-around improvement in performance metrics compared to the unoptimized composite model, with metric values improving by 0.044, 0.051, 0.046, 0.044, 0.021 and 0.020, respectively. The IV-GA-RF model exhibits a significant advantage over the IV-GA-XGB algorithm, also optimized using the GA algorithm. The accuracy of the susceptibility map produced by the IV-GA-RF model is superior, as assessed by the Seed Cell Area Index (SCAI) method. The four factors of slope, rainfall, seismicity, and stratigraphic lithology are crucial in determining the occurrence of landslides in the study area. In summary, the IV-GA-RF model can be utilized as an effective model for analyzing landslide disasters, providing a reference for research in this field and contributing scientific insights to disaster prevention and control efforts in the study area; simultaneously, the concept of the composite optimization model introduces new perspectives into this field.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Earth Science Informatics
Earth Science Informatics COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-GEOSCIENCES, MULTIDISCIPLINARY
CiteScore
4.60
自引率
3.60%
发文量
157
审稿时长
4.3 months
期刊介绍: The Earth Science Informatics [ESIN] journal aims at rapid publication of high-quality, current, cutting-edge, and provocative scientific work in the area of Earth Science Informatics as it relates to Earth systems science and space science. This includes articles on the application of formal and computational methods, computational Earth science, spatial and temporal analyses, and all aspects of computer applications to the acquisition, storage, processing, interchange, and visualization of data and information about the materials, properties, processes, features, and phenomena that occur at all scales and locations in the Earth system’s five components (atmosphere, hydrosphere, geosphere, biosphere, cryosphere) and in space (see "About this journal" for more detail). The quarterly journal publishes research, methodology, and software articles, as well as editorials, comments, and book and software reviews. Review articles of relevant findings, topics, and methodologies are also considered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信