{"title":"审查和比较机器学习方法,以开发预测岩土特性的最佳模型,并考虑特征选择","authors":"Tengyuan Zhao , Fenglin Shen , Ling Xu","doi":"10.1016/j.sandf.2024.101523","DOIUrl":null,"url":null,"abstract":"<div><div>Geotechnical properties, such as cohesion, pile drivability, rock strength, is one of the most important and indispensable input for design or analysis of geotechnical/geological engineering projects. Conventionally, these properties are obtained from laboratory experiments with well-prepared samples or well-designed experiments in-situ. Although direct measurements are generally accurate, they are often time-consuming and laborious, and acquisition of numerous measurements is often not available. This is especially true for medium- or small-sized projects. Alternatively, the properties of interest can be predicted from readily available indices by some machine learning (ML) methods, which has been applied to geotechnical engineering increasingly in recent years. Although ML methods perform reasonably well in predicting target geotechnical properties, all features considered subjectively relevant were often taken as input to the developed model. However, not all features contribute equally significant to the prediction. Involvement of irrelevant indices in an ML model would increase the model complexity, add additional difficulty in result interpretation, and introduce a risk of degrading the model’s generalization ability. Although these points have been well recognized in literature, only few studies carried out feature selection when ML methods are applied to geotechnical/geological engineering. This paper aims to alleviate this gap by offering a comprehensive review and comparison of commonly used ML methods, with consideration of various methods for feature selection. Selection of relevant features for the problem at hand also agrees well with the spirit of “<em>data first practice central agenda</em>” in data-centric geotechnics. Both simulated and real-life datasets are used to compare performance of the various ML methods in feature selection and prediction. Results show that fully Bayesian-Gaussian process regression (fB-GPR) outperforms other ML models.</div></div>","PeriodicalId":21857,"journal":{"name":"Soils and Foundations","volume":"64 6","pages":"Article 101523"},"PeriodicalIF":3.3000,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Review and comparison of machine learning methods in developing optimal models for predicting geotechnical properties with consideration of feature selection\",\"authors\":\"Tengyuan Zhao , Fenglin Shen , Ling Xu\",\"doi\":\"10.1016/j.sandf.2024.101523\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Geotechnical properties, such as cohesion, pile drivability, rock strength, is one of the most important and indispensable input for design or analysis of geotechnical/geological engineering projects. Conventionally, these properties are obtained from laboratory experiments with well-prepared samples or well-designed experiments in-situ. Although direct measurements are generally accurate, they are often time-consuming and laborious, and acquisition of numerous measurements is often not available. This is especially true for medium- or small-sized projects. Alternatively, the properties of interest can be predicted from readily available indices by some machine learning (ML) methods, which has been applied to geotechnical engineering increasingly in recent years. Although ML methods perform reasonably well in predicting target geotechnical properties, all features considered subjectively relevant were often taken as input to the developed model. However, not all features contribute equally significant to the prediction. Involvement of irrelevant indices in an ML model would increase the model complexity, add additional difficulty in result interpretation, and introduce a risk of degrading the model’s generalization ability. Although these points have been well recognized in literature, only few studies carried out feature selection when ML methods are applied to geotechnical/geological engineering. This paper aims to alleviate this gap by offering a comprehensive review and comparison of commonly used ML methods, with consideration of various methods for feature selection. Selection of relevant features for the problem at hand also agrees well with the spirit of “<em>data first practice central agenda</em>” in data-centric geotechnics. Both simulated and real-life datasets are used to compare performance of the various ML methods in feature selection and prediction. Results show that fully Bayesian-Gaussian process regression (fB-GPR) outperforms other ML models.</div></div>\",\"PeriodicalId\":21857,\"journal\":{\"name\":\"Soils and Foundations\",\"volume\":\"64 6\",\"pages\":\"Article 101523\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2024-10-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Soils and Foundations\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S003808062400101X\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, GEOLOGICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Soils and Foundations","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S003808062400101X","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, GEOLOGICAL","Score":null,"Total":0}
引用次数: 0
摘要
岩土特性,如内聚力、可打桩性、岩石强度,是岩土/地质工程项目设计或分析中最重要且不可或缺的输入参数之一。传统上,这些特性都是通过实验室实验或精心设计的现场实验获得的。虽然直接测量通常比较准确,但往往费时费力,而且往往无法获得大量测量数据。这对于中小型项目来说尤其如此。另外,也可以通过一些机器学习(ML)方法,根据现成的指标来预测相关特性,近年来,这种方法已越来越多地应用于岩土工程中。尽管 ML 方法在预测目标岩土特性方面表现相当出色,但所有被认为主观相关的特征通常都被作为开发模型的输入。然而,并非所有特征对预测都有同样重要的贡献。将不相关的指标纳入 ML 模型会增加模型的复杂性,增加结果解释的难度,并有可能降低模型的泛化能力。虽然这些观点已在文献中得到充分认识,但在将 ML 方法应用于岩土工程/地质工程时,进行特征选择的研究却寥寥无几。本文旨在通过对常用的 ML 方法进行全面回顾和比较,并考虑各种特征选择方法,从而弥补这一不足。针对当前问题选择相关特征,也非常符合以数据为中心的岩土工程学中 "数据第一实践中心议程 "的精神。模拟数据集和实际数据集都用于比较各种多项式方法在特征选择和预测方面的性能。结果表明,完全贝叶斯高斯过程回归(fB-GPR)优于其他 ML 模型。
Review and comparison of machine learning methods in developing optimal models for predicting geotechnical properties with consideration of feature selection
Geotechnical properties, such as cohesion, pile drivability, rock strength, is one of the most important and indispensable input for design or analysis of geotechnical/geological engineering projects. Conventionally, these properties are obtained from laboratory experiments with well-prepared samples or well-designed experiments in-situ. Although direct measurements are generally accurate, they are often time-consuming and laborious, and acquisition of numerous measurements is often not available. This is especially true for medium- or small-sized projects. Alternatively, the properties of interest can be predicted from readily available indices by some machine learning (ML) methods, which has been applied to geotechnical engineering increasingly in recent years. Although ML methods perform reasonably well in predicting target geotechnical properties, all features considered subjectively relevant were often taken as input to the developed model. However, not all features contribute equally significant to the prediction. Involvement of irrelevant indices in an ML model would increase the model complexity, add additional difficulty in result interpretation, and introduce a risk of degrading the model’s generalization ability. Although these points have been well recognized in literature, only few studies carried out feature selection when ML methods are applied to geotechnical/geological engineering. This paper aims to alleviate this gap by offering a comprehensive review and comparison of commonly used ML methods, with consideration of various methods for feature selection. Selection of relevant features for the problem at hand also agrees well with the spirit of “data first practice central agenda” in data-centric geotechnics. Both simulated and real-life datasets are used to compare performance of the various ML methods in feature selection and prediction. Results show that fully Bayesian-Gaussian process regression (fB-GPR) outperforms other ML models.
期刊介绍:
Soils and Foundations is one of the leading journals in the field of soil mechanics and geotechnical engineering. It is the official journal of the Japanese Geotechnical Society (JGS)., The journal publishes a variety of original research paper, technical reports, technical notes, as well as the state-of-the-art reports upon invitation by the Editor, in the fields of soil and rock mechanics, geotechnical engineering, and environmental geotechnics. Since the publication of Volume 1, No.1 issue in June 1960, Soils and Foundations will celebrate the 60th anniversary in the year of 2020.
Soils and Foundations welcomes theoretical as well as practical work associated with the aforementioned field(s). Case studies that describe the original and interdisciplinary work applicable to geotechnical engineering are particularly encouraged. Discussions to each of the published articles are also welcomed in order to provide an avenue in which opinions of peers may be fed back or exchanged. In providing latest expertise on a specific topic, one issue out of six per year on average was allocated to include selected papers from the International Symposia which were held in Japan as well as overseas.