A prediction model for dyke-dam piping based on data augmentation and interpretable ensemble learning

IF 5.7 2区 工程技术 Q1 ENGINEERING, MECHANICAL
Xi Zhang , Yangyang Xia , Chao Zhang , Bokai Liu , Cuixia Wang , Hongyuan Fang , Jing Wang
{"title":"A prediction model for dyke-dam piping based on data augmentation and interpretable ensemble learning","authors":"Xi Zhang ,&nbsp;Yangyang Xia ,&nbsp;Chao Zhang ,&nbsp;Bokai Liu ,&nbsp;Cuixia Wang ,&nbsp;Hongyuan Fang ,&nbsp;Jing Wang","doi":"10.1016/j.engfailanal.2025.110174","DOIUrl":null,"url":null,"abstract":"<div><div>Piping is one of the most common and hazardous issue in dyke and dam engineering, posing challenges for dyke and dam stability and risk assessments. In this study, an interpretable ensemble learning prediction model of dyke and dam piping was proposed based on the Synthetic Minority Over-sampling Technique (SMOTE) method and Ensemble Learning (EL) algorithm with a dataset collected from Yangtze River. Initially, the piping dataset was visualized using the violin diagram, and the SMOTE method was adopted to augment the imbalanced dataset. Then, t-distributed Stochastic Neighbor Embedding (t-SEN) method and Pearson correlation coefficient were used to consider the similarity between the newly generated samples and the original samples, which verify the effectiveness of the data augmentation. Subsequently, based on the augmented dataset, six EL algorithms were employed to establish the regression prediction model of piping. Through comprehensive comparison, the SMOTE-Categorical Boosting (SMOTE-CatBoost) model exhibits superior prediction accuracy and lower calculation cost, with a goodness of fit (R<sup>2</sup>) of 0.9886 and a Root Mean Square Error (RMSE) of 0.05334, making it the ideal prediction model for dyke and dam piping. Additionally, an Explainable Artificial Intelligence (XAI) model of<!--> <!-->piping was developed, and it was found that the thickness of overburden thickness of weak permeable layer (<span><math><mrow><mi>H</mi></mrow></math></span>), void ratio (<span><math><mrow><mi>e</mi></mrow></math></span>), water level height difference (<span><math><mrow><mi>Δ</mi><mrow><mi>h</mi></mrow></mrow></math></span>), and compression coefficient (<span><math><mrow><msub><mrow><mi>a</mi></mrow><mrow><mi>v</mi></mrow></msub></mrow></math></span>) are the four primary influencing factors of piping. The research offers valuable reference for the advance monitoring of dyke and dam piping risk, and contributes to the sustainable maintenance of dyke and dam engineering structures.</div></div>","PeriodicalId":11677,"journal":{"name":"Engineering Failure Analysis","volume":"182 ","pages":"Article 110174"},"PeriodicalIF":5.7000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Failure Analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S135063072500915X","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MECHANICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Piping is one of the most common and hazardous issue in dyke and dam engineering, posing challenges for dyke and dam stability and risk assessments. In this study, an interpretable ensemble learning prediction model of dyke and dam piping was proposed based on the Synthetic Minority Over-sampling Technique (SMOTE) method and Ensemble Learning (EL) algorithm with a dataset collected from Yangtze River. Initially, the piping dataset was visualized using the violin diagram, and the SMOTE method was adopted to augment the imbalanced dataset. Then, t-distributed Stochastic Neighbor Embedding (t-SEN) method and Pearson correlation coefficient were used to consider the similarity between the newly generated samples and the original samples, which verify the effectiveness of the data augmentation. Subsequently, based on the augmented dataset, six EL algorithms were employed to establish the regression prediction model of piping. Through comprehensive comparison, the SMOTE-Categorical Boosting (SMOTE-CatBoost) model exhibits superior prediction accuracy and lower calculation cost, with a goodness of fit (R2) of 0.9886 and a Root Mean Square Error (RMSE) of 0.05334, making it the ideal prediction model for dyke and dam piping. Additionally, an Explainable Artificial Intelligence (XAI) model of piping was developed, and it was found that the thickness of overburden thickness of weak permeable layer (H), void ratio (e), water level height difference (Δh), and compression coefficient (av) are the four primary influencing factors of piping. The research offers valuable reference for the advance monitoring of dyke and dam piping risk, and contributes to the sustainable maintenance of dyke and dam engineering structures.
基于数据增强和可解释集成学习的堤坝管道预测模型
管道工程是堤防工程中最常见和最危险的问题之一,对堤防稳定性和风险评估提出了挑战。本文以长江河段为研究对象,基于合成少数派过采样技术(SMOTE)和集成学习(EL)算法,建立了可解释的堤坝管道集成学习预测模型。首先利用小提琴图对管道数据集进行可视化,然后采用SMOTE方法对不平衡数据集进行增强。然后,利用t分布随机近邻嵌入(t-SEN)方法和Pearson相关系数考虑新生成样本与原始样本的相似性,验证了数据增强的有效性。随后,在增强数据集的基础上,采用6种EL算法建立管道的回归预测模型。综合比较,SMOTE-Categorical Boosting (SMOTE-CatBoost)模型具有较高的预测精度和较低的计算成本,拟合优度(R2)为0.9886,均方根误差(RMSE)为0.05334,是堤防管道的理想预测模型。建立了管道可解释人工智能(Explainable Artificial Intelligence, XAI)模型,发现覆盖层厚度(H)、弱渗透层厚度(e)、水位高差(Δh)和压缩系数(av)是影响管道性能的4个主要因素。研究结果为堤坝管道风险的提前监测提供了有价值的参考,有助于堤坝工程结构的可持续维护。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Engineering Failure Analysis
Engineering Failure Analysis 工程技术-材料科学:表征与测试
CiteScore
7.70
自引率
20.00%
发文量
956
审稿时长
47 days
期刊介绍: Engineering Failure Analysis publishes research papers describing the analysis of engineering failures and related studies. Papers relating to the structure, properties and behaviour of engineering materials are encouraged, particularly those which also involve the detailed application of materials parameters to problems in engineering structures, components and design. In addition to the area of materials engineering, the interacting fields of mechanical, manufacturing, aeronautical, civil, chemical, corrosion and design engineering are considered relevant. Activity should be directed at analysing engineering failures and carrying out research to help reduce the incidences of failures and to extend the operating horizons of engineering materials. Emphasis is placed on the mechanical properties of materials and their behaviour when influenced by structure, process and environment. Metallic, polymeric, ceramic and natural materials are all included and the application of these materials to real engineering situations should be emphasised. The use of a case-study based approach is also encouraged. Engineering Failure Analysis provides essential reference material and critical feedback into the design process thereby contributing to the prevention of engineering failures in the future. All submissions will be subject to peer review from leading experts in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信