Analyzing the influence of particle size distribution on the maximum shear modulus of soil with an interpretable machine learning framework and laboratory test dataset

IF 4.2 2区 工程技术 Q1 ENGINEERING, GEOLOGICAL
Xingyang Liu , Degao Zou , Yuan Chen , Huafu Pei , Zhanchao Li , Linsong Sun , Laifu Song
{"title":"Analyzing the influence of particle size distribution on the maximum shear modulus of soil with an interpretable machine learning framework and laboratory test dataset","authors":"Xingyang Liu ,&nbsp;Degao Zou ,&nbsp;Yuan Chen ,&nbsp;Huafu Pei ,&nbsp;Zhanchao Li ,&nbsp;Linsong Sun ,&nbsp;Laifu Song","doi":"10.1016/j.soildyn.2024.109031","DOIUrl":null,"url":null,"abstract":"<div><div>The maximum shear modulus (<em>G</em><sub>max</sub>) is a key parameter used to characterize the dynamic properties of soils. In this research, a dataset was systematically collected and constructed through literature review. It comprises 2782 instances of <em>G</em><sub>max</sub> values and their influencing factors for various soil types, aimed at examining the effect of particle size distribution on the <em>G</em><sub>max</sub>. The eXtreme Gradient Boosting (XGBoost) algorithm was employed to develop the predictive model for <em>G</em><sub>max</sub>, followed by the enhancement of model's performance through Bayesian Optimization (BO) algorithm. After comparison with other empirical models, the BO-XGBoost model was selected as the best model. Finally, the prediction of BO-XGBoost was interpreted using the SHapley Additive exPlanations (SHAP) framework in order to overcome the black box problem of traditional machine learning methods. The results suggest that SHAP effectively extracts critical information from the data when data labels are appropriately configured, thereby augmenting the reliability of the prediction outcomes. Globally, the feature importance ranking and the direction of correlations between input features and the output variable align with the prior knowledge. Locally, however, the importance ranking of features for individual samples may deviate from the global trend. Meanwhile, the influence of identical input features can vary across different samples.</div></div>","PeriodicalId":49502,"journal":{"name":"Soil Dynamics and Earthquake Engineering","volume":"188 ","pages":"Article 109031"},"PeriodicalIF":4.2000,"publicationDate":"2024-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Soil Dynamics and Earthquake Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0267726124005839","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, GEOLOGICAL","Score":null,"Total":0}
引用次数: 0

Abstract

The maximum shear modulus (Gmax) is a key parameter used to characterize the dynamic properties of soils. In this research, a dataset was systematically collected and constructed through literature review. It comprises 2782 instances of Gmax values and their influencing factors for various soil types, aimed at examining the effect of particle size distribution on the Gmax. The eXtreme Gradient Boosting (XGBoost) algorithm was employed to develop the predictive model for Gmax, followed by the enhancement of model's performance through Bayesian Optimization (BO) algorithm. After comparison with other empirical models, the BO-XGBoost model was selected as the best model. Finally, the prediction of BO-XGBoost was interpreted using the SHapley Additive exPlanations (SHAP) framework in order to overcome the black box problem of traditional machine learning methods. The results suggest that SHAP effectively extracts critical information from the data when data labels are appropriately configured, thereby augmenting the reliability of the prediction outcomes. Globally, the feature importance ranking and the direction of correlations between input features and the output variable align with the prior knowledge. Locally, however, the importance ranking of features for individual samples may deviate from the global trend. Meanwhile, the influence of identical input features can vary across different samples.
利用可解释的机器学习框架和实验室测试数据集分析粒度分布对土壤最大剪切模量的影响
最大剪切模量(Gmax)是表征土壤动态特性的一个关键参数。在这项研究中,我们通过查阅文献系统地收集和构建了一个数据集。该数据集包括各种土壤类型的 Gmax 值及其影响因素的 2782 个实例,旨在研究粒度分布对 Gmax 的影响。在开发 Gmax 预测模型时采用了梯度提升(XGBoost)算法,然后通过贝叶斯优化(BO)算法提高了模型的性能。经过与其他经验模型的比较,BO-XGBoost 模型被选为最佳模型。最后,使用 SHapley Additive exPlanations(SHAP)框架对 BO-XGBoost 预测进行了解释,以克服传统机器学习方法的黑箱问题。结果表明,当数据标签配置得当时,SHAP 能有效地从数据中提取关键信息,从而提高预测结果的可靠性。从整体上看,输入特征与输出变量之间的特征重要性排序和相关性方向与先验知识一致。但从局部来看,单个样本的特征重要性排序可能会偏离整体趋势。同时,相同输入特征对不同样本的影响也可能不同。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Soil Dynamics and Earthquake Engineering
Soil Dynamics and Earthquake Engineering 工程技术-地球科学综合
CiteScore
7.50
自引率
15.00%
发文量
446
审稿时长
8 months
期刊介绍: The journal aims to encourage and enhance the role of mechanics and other disciplines as they relate to earthquake engineering by providing opportunities for the publication of the work of applied mathematicians, engineers and other applied scientists involved in solving problems closely related to the field of earthquake engineering and geotechnical earthquake engineering. Emphasis is placed on new concepts and techniques, but case histories will also be published if they enhance the presentation and understanding of new technical concepts.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信