Xi Zhang , Yangyang Xia , Chao Zhang , Bokai Liu , Cuixia Wang , Hongyuan Fang , Jing Wang
{"title":"A prediction model for dyke-dam piping based on data augmentation and interpretable ensemble learning","authors":"Xi Zhang , Yangyang Xia , Chao Zhang , Bokai Liu , Cuixia Wang , Hongyuan Fang , Jing Wang","doi":"10.1016/j.engfailanal.2025.110174","DOIUrl":null,"url":null,"abstract":"<div><div>Piping is one of the most common and hazardous issue in dyke and dam engineering, posing challenges for dyke and dam stability and risk assessments. In this study, an interpretable ensemble learning prediction model of dyke and dam piping was proposed based on the Synthetic Minority Over-sampling Technique (SMOTE) method and Ensemble Learning (EL) algorithm with a dataset collected from Yangtze River. Initially, the piping dataset was visualized using the violin diagram, and the SMOTE method was adopted to augment the imbalanced dataset. Then, t-distributed Stochastic Neighbor Embedding (t-SEN) method and Pearson correlation coefficient were used to consider the similarity between the newly generated samples and the original samples, which verify the effectiveness of the data augmentation. Subsequently, based on the augmented dataset, six EL algorithms were employed to establish the regression prediction model of piping. Through comprehensive comparison, the SMOTE-Categorical Boosting (SMOTE-CatBoost) model exhibits superior prediction accuracy and lower calculation cost, with a goodness of fit (R<sup>2</sup>) of 0.9886 and a Root Mean Square Error (RMSE) of 0.05334, making it the ideal prediction model for dyke and dam piping. Additionally, an Explainable Artificial Intelligence (XAI) model of<!--> <!-->piping was developed, and it was found that the thickness of overburden thickness of weak permeable layer (<span><math><mrow><mi>H</mi></mrow></math></span>), void ratio (<span><math><mrow><mi>e</mi></mrow></math></span>), water level height difference (<span><math><mrow><mi>Δ</mi><mrow><mi>h</mi></mrow></mrow></math></span>), and compression coefficient (<span><math><mrow><msub><mrow><mi>a</mi></mrow><mrow><mi>v</mi></mrow></msub></mrow></math></span>) are the four primary influencing factors of piping. The research offers valuable reference for the advance monitoring of dyke and dam piping risk, and contributes to the sustainable maintenance of dyke and dam engineering structures.</div></div>","PeriodicalId":11677,"journal":{"name":"Engineering Failure Analysis","volume":"182 ","pages":"Article 110174"},"PeriodicalIF":5.7000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Failure Analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S135063072500915X","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MECHANICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Piping is one of the most common and hazardous issue in dyke and dam engineering, posing challenges for dyke and dam stability and risk assessments. In this study, an interpretable ensemble learning prediction model of dyke and dam piping was proposed based on the Synthetic Minority Over-sampling Technique (SMOTE) method and Ensemble Learning (EL) algorithm with a dataset collected from Yangtze River. Initially, the piping dataset was visualized using the violin diagram, and the SMOTE method was adopted to augment the imbalanced dataset. Then, t-distributed Stochastic Neighbor Embedding (t-SEN) method and Pearson correlation coefficient were used to consider the similarity between the newly generated samples and the original samples, which verify the effectiveness of the data augmentation. Subsequently, based on the augmented dataset, six EL algorithms were employed to establish the regression prediction model of piping. Through comprehensive comparison, the SMOTE-Categorical Boosting (SMOTE-CatBoost) model exhibits superior prediction accuracy and lower calculation cost, with a goodness of fit (R2) of 0.9886 and a Root Mean Square Error (RMSE) of 0.05334, making it the ideal prediction model for dyke and dam piping. Additionally, an Explainable Artificial Intelligence (XAI) model of piping was developed, and it was found that the thickness of overburden thickness of weak permeable layer (), void ratio (), water level height difference (), and compression coefficient () are the four primary influencing factors of piping. The research offers valuable reference for the advance monitoring of dyke and dam piping risk, and contributes to the sustainable maintenance of dyke and dam engineering structures.
期刊介绍:
Engineering Failure Analysis publishes research papers describing the analysis of engineering failures and related studies.
Papers relating to the structure, properties and behaviour of engineering materials are encouraged, particularly those which also involve the detailed application of materials parameters to problems in engineering structures, components and design. In addition to the area of materials engineering, the interacting fields of mechanical, manufacturing, aeronautical, civil, chemical, corrosion and design engineering are considered relevant. Activity should be directed at analysing engineering failures and carrying out research to help reduce the incidences of failures and to extend the operating horizons of engineering materials.
Emphasis is placed on the mechanical properties of materials and their behaviour when influenced by structure, process and environment. Metallic, polymeric, ceramic and natural materials are all included and the application of these materials to real engineering situations should be emphasised. The use of a case-study based approach is also encouraged.
Engineering Failure Analysis provides essential reference material and critical feedback into the design process thereby contributing to the prevention of engineering failures in the future. All submissions will be subject to peer review from leading experts in the field.