{"title":"考虑采样策略的SPT、CPT、v数据库组合随机森林地震液化预测模型","authors":"Jilei Hu , Lianming Huang , Qi Shao","doi":"10.1016/j.soildyn.2025.109642","DOIUrl":null,"url":null,"abstract":"<div><div>The sampling strategy has an important impact on the accuracy of seismic liquefaction discrimination models. In addition, different models may produce contradictory discriminative results. This paper, based on three in situ experimental data (standard penetration test (SPT), cone penetration test (CPT), and shear wave velocity (V<sub>s</sub>)), adopts the Random Forest (RF) method to analyze, the effects of five probabilistic sampling methods (Simple Random Sampling (SRS), Unordered Systematic Sampling (USS), Ordered Systematic Sampling (OSS), Stratified Random Sampling (StrRS), and Cluster Sampling (CS)) and five integration methods (sequential integration, voting, simple averaging, weighted averaging, and Bayesian model averaging) on the RF models of seismic liquefaction, and constructs three RF model based different in-situ tests data and a Combined RF Model (CRF). The results show that the sampling methods have a large impact on the performance of the RF model. Among them, the OSS method performed the best in different in-situ test databases with Acc = 0.9 and <em>F</em><sub><em>1</em></sub> = 0.930 for the RF-SPT model (the RF model based on the SPT data), Acc = 0.88 and <em>F</em><sub><em>1</em></sub> = 0.918 for the RF-CPT model (the RF model based on the CPT data), Acc = 0.872 and <em>F</em><sub><em>1</em></sub> = 0.913 for the RF-Vs model (the RF model based on the V<sub>s</sub> data), whereas, the CS method performed the worst in the datasets. In addition, sensitivity analysis of the RF models under the optimal sampling method was performed. In combined models, integration modes do not always improve model performance, and sequential integration fails to improve model performance in this study. However, the CRF based on the Bayesian model averaging method performed the best with Acc = 0.924 and <em>F</em><sub><em>1</em></sub> = 0.947, which is better than the RF-SPT model.</div></div>","PeriodicalId":49502,"journal":{"name":"Soil Dynamics and Earthquake Engineering","volume":"198 ","pages":"Article 109642"},"PeriodicalIF":4.6000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Combination models of random forest for predicting seismic liquefaction based on SPT, CPT, Vs databases considering sampling strategies\",\"authors\":\"Jilei Hu , Lianming Huang , Qi Shao\",\"doi\":\"10.1016/j.soildyn.2025.109642\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The sampling strategy has an important impact on the accuracy of seismic liquefaction discrimination models. In addition, different models may produce contradictory discriminative results. This paper, based on three in situ experimental data (standard penetration test (SPT), cone penetration test (CPT), and shear wave velocity (V<sub>s</sub>)), adopts the Random Forest (RF) method to analyze, the effects of five probabilistic sampling methods (Simple Random Sampling (SRS), Unordered Systematic Sampling (USS), Ordered Systematic Sampling (OSS), Stratified Random Sampling (StrRS), and Cluster Sampling (CS)) and five integration methods (sequential integration, voting, simple averaging, weighted averaging, and Bayesian model averaging) on the RF models of seismic liquefaction, and constructs three RF model based different in-situ tests data and a Combined RF Model (CRF). The results show that the sampling methods have a large impact on the performance of the RF model. Among them, the OSS method performed the best in different in-situ test databases with Acc = 0.9 and <em>F</em><sub><em>1</em></sub> = 0.930 for the RF-SPT model (the RF model based on the SPT data), Acc = 0.88 and <em>F</em><sub><em>1</em></sub> = 0.918 for the RF-CPT model (the RF model based on the CPT data), Acc = 0.872 and <em>F</em><sub><em>1</em></sub> = 0.913 for the RF-Vs model (the RF model based on the V<sub>s</sub> data), whereas, the CS method performed the worst in the datasets. In addition, sensitivity analysis of the RF models under the optimal sampling method was performed. In combined models, integration modes do not always improve model performance, and sequential integration fails to improve model performance in this study. However, the CRF based on the Bayesian model averaging method performed the best with Acc = 0.924 and <em>F</em><sub><em>1</em></sub> = 0.947, which is better than the RF-SPT model.</div></div>\",\"PeriodicalId\":49502,\"journal\":{\"name\":\"Soil Dynamics and Earthquake Engineering\",\"volume\":\"198 \",\"pages\":\"Article 109642\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Soil Dynamics and Earthquake Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S026772612500435X\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, GEOLOGICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Soil Dynamics and Earthquake Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S026772612500435X","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, GEOLOGICAL","Score":null,"Total":0}
Combination models of random forest for predicting seismic liquefaction based on SPT, CPT, Vs databases considering sampling strategies
The sampling strategy has an important impact on the accuracy of seismic liquefaction discrimination models. In addition, different models may produce contradictory discriminative results. This paper, based on three in situ experimental data (standard penetration test (SPT), cone penetration test (CPT), and shear wave velocity (Vs)), adopts the Random Forest (RF) method to analyze, the effects of five probabilistic sampling methods (Simple Random Sampling (SRS), Unordered Systematic Sampling (USS), Ordered Systematic Sampling (OSS), Stratified Random Sampling (StrRS), and Cluster Sampling (CS)) and five integration methods (sequential integration, voting, simple averaging, weighted averaging, and Bayesian model averaging) on the RF models of seismic liquefaction, and constructs three RF model based different in-situ tests data and a Combined RF Model (CRF). The results show that the sampling methods have a large impact on the performance of the RF model. Among them, the OSS method performed the best in different in-situ test databases with Acc = 0.9 and F1 = 0.930 for the RF-SPT model (the RF model based on the SPT data), Acc = 0.88 and F1 = 0.918 for the RF-CPT model (the RF model based on the CPT data), Acc = 0.872 and F1 = 0.913 for the RF-Vs model (the RF model based on the Vs data), whereas, the CS method performed the worst in the datasets. In addition, sensitivity analysis of the RF models under the optimal sampling method was performed. In combined models, integration modes do not always improve model performance, and sequential integration fails to improve model performance in this study. However, the CRF based on the Bayesian model averaging method performed the best with Acc = 0.924 and F1 = 0.947, which is better than the RF-SPT model.
期刊介绍:
The journal aims to encourage and enhance the role of mechanics and other disciplines as they relate to earthquake engineering by providing opportunities for the publication of the work of applied mathematicians, engineers and other applied scientists involved in solving problems closely related to the field of earthquake engineering and geotechnical earthquake engineering.
Emphasis is placed on new concepts and techniques, but case histories will also be published if they enhance the presentation and understanding of new technical concepts.