隧道掘进机性能预测的泛化挑战与策略

Shengfeng Huang, George Korfiatis, Rita Sousa
{"title":"隧道掘进机性能预测的泛化挑战与策略","authors":"Shengfeng Huang,&nbsp;George Korfiatis,&nbsp;Rita Sousa","doi":"10.1002/cend.202400047","DOIUrl":null,"url":null,"abstract":"<p>Achieving robust generalization in machine learning for tunnel boring machine performance prediction is challenging, particularly when models are developed on data from different projects. This study assesses the generalization abilities of <i>K</i>-nearest neighbors, support vector regression, artificial neural networks, random forest, classification and regression trees, and extreme gradient boosting (XGBoost) for predicting penetration rate (PR) and explores the potential of incremental learning to enhance it. The datasets were collected from two tunnels (Line C and Line S) that were constructed in similar geological formation, with similar technology, in Porto, Portugal. In the first part, these models are trained using Line C data and applied to Line S for generalization assess under different splitting and scaling methods. XGBoost demonstrated superior performance in both accuracy and generalization, making it the base model for incremental learning. In the second part, the incremental learning, applied by continually updating the XGBoost model when new data becomes available, was evaluated across different PR ranges and different incremental sizes. Finally, the model trained using Line C data plus Line S data was compared to the model using Line S only to investigate the impact of including Line C data on generalization. Our findings show that for the imbalanced data of PR &lt;125 mm/rpm, incremental learning show unstable generalization but exhibited improved generalization on PR &lt; 25 mm/rpm. For the balanced data of PR &lt;25 mm/rpm, incremental learning showed more stable and gradually improved generalization. Combining Line C data with Line S data for training improved generalization significantly. The study results provide important insights into developing generalization strategies, highlighting the benefits of pre-training with similar data and the challenges of dealing with imbalanced data in real-life projects.</p>","PeriodicalId":100248,"journal":{"name":"Civil Engineering Design","volume":"7 2","pages":"63-84"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cend.202400047","citationCount":"0","resultStr":"{\"title\":\"Generalization challenges and strategies in tunnel boring machine performance prediction\",\"authors\":\"Shengfeng Huang,&nbsp;George Korfiatis,&nbsp;Rita Sousa\",\"doi\":\"10.1002/cend.202400047\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Achieving robust generalization in machine learning for tunnel boring machine performance prediction is challenging, particularly when models are developed on data from different projects. This study assesses the generalization abilities of <i>K</i>-nearest neighbors, support vector regression, artificial neural networks, random forest, classification and regression trees, and extreme gradient boosting (XGBoost) for predicting penetration rate (PR) and explores the potential of incremental learning to enhance it. The datasets were collected from two tunnels (Line C and Line S) that were constructed in similar geological formation, with similar technology, in Porto, Portugal. In the first part, these models are trained using Line C data and applied to Line S for generalization assess under different splitting and scaling methods. XGBoost demonstrated superior performance in both accuracy and generalization, making it the base model for incremental learning. In the second part, the incremental learning, applied by continually updating the XGBoost model when new data becomes available, was evaluated across different PR ranges and different incremental sizes. Finally, the model trained using Line C data plus Line S data was compared to the model using Line S only to investigate the impact of including Line C data on generalization. Our findings show that for the imbalanced data of PR &lt;125 mm/rpm, incremental learning show unstable generalization but exhibited improved generalization on PR &lt; 25 mm/rpm. For the balanced data of PR &lt;25 mm/rpm, incremental learning showed more stable and gradually improved generalization. Combining Line C data with Line S data for training improved generalization significantly. The study results provide important insights into developing generalization strategies, highlighting the benefits of pre-training with similar data and the challenges of dealing with imbalanced data in real-life projects.</p>\",\"PeriodicalId\":100248,\"journal\":{\"name\":\"Civil Engineering Design\",\"volume\":\"7 2\",\"pages\":\"63-84\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-02-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cend.202400047\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Civil Engineering Design\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cend.202400047\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Civil Engineering Design","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cend.202400047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在机器学习中实现隧道掘进机性能预测的鲁棒泛化是具有挑战性的,特别是当模型是基于不同项目的数据开发时。本研究评估了k近邻、支持向量回归、人工神经网络、随机森林、分类和回归树以及极端梯度提升(XGBoost)在预测渗透率(PR)方面的泛化能力,并探索了增量学习的潜力。数据集是从葡萄牙波尔图的两条隧道(C线和S线)中收集的,这两条隧道采用类似的地质构造和类似的技术建造。在第一部分中,使用Line C数据训练这些模型,并将其应用于Line S,在不同的分割和缩放方法下进行泛化评估。XGBoost在准确性和泛化方面都表现出优异的性能,使其成为增量学习的基础模型。在第二部分中,通过在新数据可用时不断更新XGBoost模型来应用增量学习,在不同的PR范围和不同的增量大小上进行了评估。最后,将使用Line C数据加Line S数据训练的模型与仅使用Line S数据训练的模型进行比较,以研究包含Line C数据对泛化的影响。我们的研究结果表明,对于PR <;125 mm/rpm的不平衡数据,增量学习表现出不稳定的泛化,但对于PR <; 25 mm/rpm的泛化表现出改善的泛化。对于PR <;25 mm/rpm的平衡数据,增量学习表现出更稳定和逐步提高的泛化。将C线数据与S线数据相结合进行训练,可以显著提高泛化效果。研究结果为开发泛化策略提供了重要的见解,突出了使用相似数据进行预训练的好处以及在现实项目中处理不平衡数据的挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Generalization challenges and strategies in tunnel boring machine performance prediction

Achieving robust generalization in machine learning for tunnel boring machine performance prediction is challenging, particularly when models are developed on data from different projects. This study assesses the generalization abilities of K-nearest neighbors, support vector regression, artificial neural networks, random forest, classification and regression trees, and extreme gradient boosting (XGBoost) for predicting penetration rate (PR) and explores the potential of incremental learning to enhance it. The datasets were collected from two tunnels (Line C and Line S) that were constructed in similar geological formation, with similar technology, in Porto, Portugal. In the first part, these models are trained using Line C data and applied to Line S for generalization assess under different splitting and scaling methods. XGBoost demonstrated superior performance in both accuracy and generalization, making it the base model for incremental learning. In the second part, the incremental learning, applied by continually updating the XGBoost model when new data becomes available, was evaluated across different PR ranges and different incremental sizes. Finally, the model trained using Line C data plus Line S data was compared to the model using Line S only to investigate the impact of including Line C data on generalization. Our findings show that for the imbalanced data of PR <125 mm/rpm, incremental learning show unstable generalization but exhibited improved generalization on PR < 25 mm/rpm. For the balanced data of PR <25 mm/rpm, incremental learning showed more stable and gradually improved generalization. Combining Line C data with Line S data for training improved generalization significantly. The study results provide important insights into developing generalization strategies, highlighting the benefits of pre-training with similar data and the challenges of dealing with imbalanced data in real-life projects.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信