{"title":"隧道掘进机性能预测的泛化挑战与策略","authors":"Shengfeng Huang, George Korfiatis, Rita Sousa","doi":"10.1002/cend.202400047","DOIUrl":null,"url":null,"abstract":"<p>Achieving robust generalization in machine learning for tunnel boring machine performance prediction is challenging, particularly when models are developed on data from different projects. This study assesses the generalization abilities of <i>K</i>-nearest neighbors, support vector regression, artificial neural networks, random forest, classification and regression trees, and extreme gradient boosting (XGBoost) for predicting penetration rate (PR) and explores the potential of incremental learning to enhance it. The datasets were collected from two tunnels (Line C and Line S) that were constructed in similar geological formation, with similar technology, in Porto, Portugal. In the first part, these models are trained using Line C data and applied to Line S for generalization assess under different splitting and scaling methods. XGBoost demonstrated superior performance in both accuracy and generalization, making it the base model for incremental learning. In the second part, the incremental learning, applied by continually updating the XGBoost model when new data becomes available, was evaluated across different PR ranges and different incremental sizes. Finally, the model trained using Line C data plus Line S data was compared to the model using Line S only to investigate the impact of including Line C data on generalization. Our findings show that for the imbalanced data of PR <125 mm/rpm, incremental learning show unstable generalization but exhibited improved generalization on PR < 25 mm/rpm. For the balanced data of PR <25 mm/rpm, incremental learning showed more stable and gradually improved generalization. Combining Line C data with Line S data for training improved generalization significantly. The study results provide important insights into developing generalization strategies, highlighting the benefits of pre-training with similar data and the challenges of dealing with imbalanced data in real-life projects.</p>","PeriodicalId":100248,"journal":{"name":"Civil Engineering Design","volume":"7 2","pages":"63-84"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cend.202400047","citationCount":"0","resultStr":"{\"title\":\"Generalization challenges and strategies in tunnel boring machine performance prediction\",\"authors\":\"Shengfeng Huang, George Korfiatis, Rita Sousa\",\"doi\":\"10.1002/cend.202400047\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Achieving robust generalization in machine learning for tunnel boring machine performance prediction is challenging, particularly when models are developed on data from different projects. This study assesses the generalization abilities of <i>K</i>-nearest neighbors, support vector regression, artificial neural networks, random forest, classification and regression trees, and extreme gradient boosting (XGBoost) for predicting penetration rate (PR) and explores the potential of incremental learning to enhance it. The datasets were collected from two tunnels (Line C and Line S) that were constructed in similar geological formation, with similar technology, in Porto, Portugal. In the first part, these models are trained using Line C data and applied to Line S for generalization assess under different splitting and scaling methods. XGBoost demonstrated superior performance in both accuracy and generalization, making it the base model for incremental learning. In the second part, the incremental learning, applied by continually updating the XGBoost model when new data becomes available, was evaluated across different PR ranges and different incremental sizes. Finally, the model trained using Line C data plus Line S data was compared to the model using Line S only to investigate the impact of including Line C data on generalization. Our findings show that for the imbalanced data of PR <125 mm/rpm, incremental learning show unstable generalization but exhibited improved generalization on PR < 25 mm/rpm. For the balanced data of PR <25 mm/rpm, incremental learning showed more stable and gradually improved generalization. Combining Line C data with Line S data for training improved generalization significantly. The study results provide important insights into developing generalization strategies, highlighting the benefits of pre-training with similar data and the challenges of dealing with imbalanced data in real-life projects.</p>\",\"PeriodicalId\":100248,\"journal\":{\"name\":\"Civil Engineering Design\",\"volume\":\"7 2\",\"pages\":\"63-84\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-02-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cend.202400047\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Civil Engineering Design\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cend.202400047\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Civil Engineering Design","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cend.202400047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Generalization challenges and strategies in tunnel boring machine performance prediction
Achieving robust generalization in machine learning for tunnel boring machine performance prediction is challenging, particularly when models are developed on data from different projects. This study assesses the generalization abilities of K-nearest neighbors, support vector regression, artificial neural networks, random forest, classification and regression trees, and extreme gradient boosting (XGBoost) for predicting penetration rate (PR) and explores the potential of incremental learning to enhance it. The datasets were collected from two tunnels (Line C and Line S) that were constructed in similar geological formation, with similar technology, in Porto, Portugal. In the first part, these models are trained using Line C data and applied to Line S for generalization assess under different splitting and scaling methods. XGBoost demonstrated superior performance in both accuracy and generalization, making it the base model for incremental learning. In the second part, the incremental learning, applied by continually updating the XGBoost model when new data becomes available, was evaluated across different PR ranges and different incremental sizes. Finally, the model trained using Line C data plus Line S data was compared to the model using Line S only to investigate the impact of including Line C data on generalization. Our findings show that for the imbalanced data of PR <125 mm/rpm, incremental learning show unstable generalization but exhibited improved generalization on PR < 25 mm/rpm. For the balanced data of PR <25 mm/rpm, incremental learning showed more stable and gradually improved generalization. Combining Line C data with Line S data for training improved generalization significantly. The study results provide important insights into developing generalization strategies, highlighting the benefits of pre-training with similar data and the challenges of dealing with imbalanced data in real-life projects.