基于数据增强和可解释集成学习的高路堤边坡稳定性预测

IF 8.5 1区工程技术 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computer-Aided Civil and Infrastructure Engineering Pub Date : 2025-04-11 DOI:10.1111/mice.13478

Zongyu Zhang, Junjie Huang, Qian Su, Shijie Liu, Naeem Mangi, Qi Zhang, Allen A. Zhang, Yao Liu, Shengyang Wang

{"title":"基于数据增强和可解释集成学习的高路堤边坡稳定性预测","authors":"Zongyu Zhang, Junjie Huang, Qian Su, Shijie Liu, Naeem Mangi, Qi Zhang, Allen A. Zhang, Yao Liu, Shengyang Wang","doi":"10.1111/mice.13478","DOIUrl":null,"url":null,"abstract":"The stability of embankment slopes for heavy-haul railway foundations is essential for safe railway operations. Railway embankment slope stability datasets often rely on engineering judgment for analysis. The labor- and resource-intensive processes of data preparation result in small dataset sizes. Machine learning analysis of small-sample potential features is a key low-cost approach for slope prediction. Due to the limited availability of slope failure data, a specialized framework is required for predictive modeling. To address this challenge, the focus is placed on data augmentation and interpretability analysis. A generative adversarial model is constructed using a graph convolutional network-based generator and a discriminator based on Gated Recurrent Unit, accompanied by a quality control method for the generated samples based on maximum mean discrepancy and one-class Support Vector Machine. This approach is designed to more effectively capture the temporal and spatial features of small samples. Three ensemble learning models, namely, XGBoost, random forest, and AdaBoost, are trained with augmented data, and model interpretation is conducted using Shapley Additive exPlanations to identify key factors affecting stability and potential stability improvement strategies. Results indicate that the proposed generative adversarial model surpasses traditional models in generating adequate data; the three enhanced data-trained machine learning models in this study achieved at least a 12% improvement in predictive accuracy, compared to their original small-sample-trained counterparts; The proposed data augmentation method outperformed variational autoencoder and diffusion models in generating high-quality synthetic data. Additionally, the interpretability framework effectively identified primary factors influencing slope stability. These findings provide a robust framework for interpretability-driven assessments of heavy-haul railway slopes with limited sample data.","PeriodicalId":156,"journal":{"name":"Computer-Aided Civil and Infrastructure Engineering","volume":"183 1","pages":""},"PeriodicalIF":8.5000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High embankment slope stability prediction using data augmentation and explainable ensemble learning\",\"authors\":\"Zongyu Zhang, Junjie Huang, Qian Su, Shijie Liu, Naeem Mangi, Qi Zhang, Allen A. Zhang, Yao Liu, Shengyang Wang\",\"doi\":\"10.1111/mice.13478\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The stability of embankment slopes for heavy-haul railway foundations is essential for safe railway operations. Railway embankment slope stability datasets often rely on engineering judgment for analysis. The labor- and resource-intensive processes of data preparation result in small dataset sizes. Machine learning analysis of small-sample potential features is a key low-cost approach for slope prediction. Due to the limited availability of slope failure data, a specialized framework is required for predictive modeling. To address this challenge, the focus is placed on data augmentation and interpretability analysis. A generative adversarial model is constructed using a graph convolutional network-based generator and a discriminator based on Gated Recurrent Unit, accompanied by a quality control method for the generated samples based on maximum mean discrepancy and one-class Support Vector Machine. This approach is designed to more effectively capture the temporal and spatial features of small samples. Three ensemble learning models, namely, XGBoost, random forest, and AdaBoost, are trained with augmented data, and model interpretation is conducted using Shapley Additive exPlanations to identify key factors affecting stability and potential stability improvement strategies. Results indicate that the proposed generative adversarial model surpasses traditional models in generating adequate data; the three enhanced data-trained machine learning models in this study achieved at least a 12% improvement in predictive accuracy, compared to their original small-sample-trained counterparts; The proposed data augmentation method outperformed variational autoencoder and diffusion models in generating high-quality synthetic data. Additionally, the interpretability framework effectively identified primary factors influencing slope stability. These findings provide a robust framework for interpretability-driven assessments of heavy-haul railway slopes with limited sample data.\",\"PeriodicalId\":156,\"journal\":{\"name\":\"Computer-Aided Civil and Infrastructure Engineering\",\"volume\":\"183 1\",\"pages\":\"\"},\"PeriodicalIF\":8.5000,\"publicationDate\":\"2025-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer-Aided Civil and Infrastructure Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1111/mice.13478\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer-Aided Civil and Infrastructure Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1111/mice.13478","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

重载铁路地基路堤边坡的稳定性对铁路安全运营至关重要。铁路路堤边坡稳定性数据集通常依靠工程判断进行分析。数据准备过程耗费大量人力物力，导致数据集规模较小。对小样本潜在特征进行机器学习分析是边坡预测的关键低成本方法。由于边坡破坏数据的可用性有限，因此需要一个专门的框架来进行预测建模。为了应对这一挑战，重点放在了数据增强和可解释性分析上。利用基于图卷积网络的生成器和基于门控递归单元的判别器构建了一个生成式对抗模型，同时还采用了基于最大均值差异和单类支持向量机的生成样本质量控制方法。这种方法旨在更有效地捕捉小样本的时间和空间特征。使用增强数据训练了三种集合学习模型，即 XGBoost、随机森林和 AdaBoost，并使用 Shapley Additive exPlanations 进行了模型解释，以确定影响稳定性的关键因素和潜在的稳定性改进策略。结果表明，所提出的生成对抗模型在生成充足数据方面超越了传统模型；与原始小样本训练的对应模型相比，本研究中三个增强数据训练的机器学习模型在预测准确性方面至少提高了 12%；在生成高质量合成数据方面，所提出的数据增强方法优于变异自动编码器和扩散模型。此外，可解释性框架有效地确定了影响斜坡稳定性的主要因素。这些发现为利用有限样本数据对重载铁路边坡进行可解释性评估提供了一个稳健的框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

High embankment slope stability prediction using data augmentation and explainable ensemble learning

The stability of embankment slopes for heavy-haul railway foundations is essential for safe railway operations. Railway embankment slope stability datasets often rely on engineering judgment for analysis. The labor- and resource-intensive processes of data preparation result in small dataset sizes. Machine learning analysis of small-sample potential features is a key low-cost approach for slope prediction. Due to the limited availability of slope failure data, a specialized framework is required for predictive modeling. To address this challenge, the focus is placed on data augmentation and interpretability analysis. A generative adversarial model is constructed using a graph convolutional network-based generator and a discriminator based on Gated Recurrent Unit, accompanied by a quality control method for the generated samples based on maximum mean discrepancy and one-class Support Vector Machine. This approach is designed to more effectively capture the temporal and spatial features of small samples. Three ensemble learning models, namely, XGBoost, random forest, and AdaBoost, are trained with augmented data, and model interpretation is conducted using Shapley Additive exPlanations to identify key factors affecting stability and potential stability improvement strategies. Results indicate that the proposed generative adversarial model surpasses traditional models in generating adequate data; the three enhanced data-trained machine learning models in this study achieved at least a 12% improvement in predictive accuracy, compared to their original small-sample-trained counterparts; The proposed data augmentation method outperformed variational autoencoder and diffusion models in generating high-quality synthetic data. Additionally, the interpretability framework effectively identified primary factors influencing slope stability. These findings provide a robust framework for interpretability-driven assessments of heavy-haul railway slopes with limited sample data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer-Aided Civil and Infrastructure Engineering 工程技术-工程：土木

CiteScore

17.60

自引率

19.80%

发文量

146

审稿时长

1 months

期刊介绍： Computer-Aided Civil and Infrastructure Engineering stands as a scholarly, peer-reviewed archival journal, serving as a vital link between advancements in computer technology and civil and infrastructure engineering. The journal serves as a distinctive platform for the publication of original articles, spotlighting novel computational techniques and inventive applications of computers. Specifically, it concentrates on recent progress in computer and information technologies, fostering the development and application of emerging computing paradigms. Encompassing a broad scope, the journal addresses bridge, construction, environmental, highway, geotechnical, structural, transportation, and water resources engineering. It extends its reach to the management of infrastructure systems, covering domains such as highways, bridges, pavements, airports, and utilities. The journal delves into areas like artificial intelligence, cognitive modeling, concurrent engineering, database management, distributed computing, evolutionary computing, fuzzy logic, genetic algorithms, geometric modeling, internet-based technologies, knowledge discovery and engineering, machine learning, mobile computing, multimedia technologies, networking, neural network computing, optimization and search, parallel processing, robotics, smart structures, software engineering, virtual reality, and visualization techniques.