盾构施工大数据质量指标研究

IF 7.5 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS
Chao Zhang , Yuhao Ren , Qihang Huang , Renpeng Chen
{"title":"盾构施工大数据质量指标研究","authors":"Chao Zhang ,&nbsp;Yuhao Ren ,&nbsp;Qihang Huang ,&nbsp;Renpeng Chen","doi":"10.1016/j.engappai.2025.111023","DOIUrl":null,"url":null,"abstract":"<div><div>The quality of the dataset underpinning the data-driven models predefines the upper limit for their performance yet lacks a quantitative way to be captured for the construction big data generated in earth pressure balance, i.e., EPB, shield tunneling. Herein, a quality index is proposed to fill this gap and formulated as an <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> norm of a vector composed of three components, i.e., accuracy, inclusiveness, and informativeness. The accuracy component is the ratio of non-outlier samples, i.e., a dataset containing fewer outliers shows a higher accuracy, reflecting the extent to which the dataset represents the real construction conditions during the tunneling. The inclusiveness component is the normalized envelope area of the dataset being mapped into a two-dimensional space, reflecting the range of diverse construction scenarios that have been included in the dataset. The informativeness component is the dimensionless uncertainty reduction of given data-driven models by the dataset, reflecting the contribution of datasets to the given model’s prediction. The proposed quality index is comprehensively assessed using a big database collected from multiple tunneling projects. A series of sub-datasets deliberately divided from the big database are utilized to train data-driven models by three commonly used algorithms, i.e., random forest, neural network, and K-nearest neighbors, for mapping three target functions widely concerned in tunneling, i.e., torque, thrust, and penetration. It is shown that the proposed quality index of the training data unfailingly excellently correlates with the performance of the data-driven models (R-values <span><math><mo>&gt;</mo></math></span> 0.91) regardless of algorithms, target functions, and sample sizes.The proposed quality index serves as a theoretical basis for a series of practical application scenarios, e.g., training data selection, and core dataset development. A practical application based on the Changsha project illustrated that the training dataset selected using the quality index can significantly boost the performance of the developed data-driven models by more than 38% and reduce training time by more than 26%.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"156 ","pages":"Article 111023"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A quality index for construction big data in shield tunneling\",\"authors\":\"Chao Zhang ,&nbsp;Yuhao Ren ,&nbsp;Qihang Huang ,&nbsp;Renpeng Chen\",\"doi\":\"10.1016/j.engappai.2025.111023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The quality of the dataset underpinning the data-driven models predefines the upper limit for their performance yet lacks a quantitative way to be captured for the construction big data generated in earth pressure balance, i.e., EPB, shield tunneling. Herein, a quality index is proposed to fill this gap and formulated as an <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> norm of a vector composed of three components, i.e., accuracy, inclusiveness, and informativeness. The accuracy component is the ratio of non-outlier samples, i.e., a dataset containing fewer outliers shows a higher accuracy, reflecting the extent to which the dataset represents the real construction conditions during the tunneling. The inclusiveness component is the normalized envelope area of the dataset being mapped into a two-dimensional space, reflecting the range of diverse construction scenarios that have been included in the dataset. The informativeness component is the dimensionless uncertainty reduction of given data-driven models by the dataset, reflecting the contribution of datasets to the given model’s prediction. The proposed quality index is comprehensively assessed using a big database collected from multiple tunneling projects. A series of sub-datasets deliberately divided from the big database are utilized to train data-driven models by three commonly used algorithms, i.e., random forest, neural network, and K-nearest neighbors, for mapping three target functions widely concerned in tunneling, i.e., torque, thrust, and penetration. It is shown that the proposed quality index of the training data unfailingly excellently correlates with the performance of the data-driven models (R-values <span><math><mo>&gt;</mo></math></span> 0.91) regardless of algorithms, target functions, and sample sizes.The proposed quality index serves as a theoretical basis for a series of practical application scenarios, e.g., training data selection, and core dataset development. A practical application based on the Changsha project illustrated that the training dataset selected using the quality index can significantly boost the performance of the developed data-driven models by more than 38% and reduce training time by more than 26%.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"156 \",\"pages\":\"Article 111023\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197625010231\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625010231","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

支撑数据驱动模型的数据集质量预先定义了其性能的上限,但对于土压平衡(即EPB、盾构隧道)中产生的施工大数据,缺乏定量捕获的方法。本文提出了一个质量指标来填补这一空白,并将其表述为由准确性、包容性和信息性三个成分组成的向量的L2范数。精度分量是非离群样本的比例,即离群样本越少的数据集精度越高,反映了数据集在隧道施工过程中对真实施工情况的反映程度。包容性组件是将数据集的归一化包络面积映射到二维空间中,反映了数据集中包含的各种构建场景的范围。信息性成分是数据集对给定数据驱动模型的无量纲不确定性约简,反映了数据集对给定模型预测的贡献。采用从多个隧道工程中收集的大型数据库对所建议的质量指标进行综合评估。利用从大数据库中刻意划分的一系列子数据集,通过随机森林、神经网络和k近邻三种常用算法训练数据驱动模型,映射隧道掘进中广泛关注的三个目标函数,即扭矩、推力和穿透。结果表明,所提出的训练数据质量指标与数据驱动模型的性能(r值>;0.91),无论算法、目标函数和样本量如何。提出的质量指标为训练数据选择、核心数据集开发等一系列实际应用场景提供理论基础。基于长沙项目的实际应用表明,使用质量指标选择的训练数据集可以将开发的数据驱动模型的性能显著提高38%以上,并将训练时间缩短26%以上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A quality index for construction big data in shield tunneling
The quality of the dataset underpinning the data-driven models predefines the upper limit for their performance yet lacks a quantitative way to be captured for the construction big data generated in earth pressure balance, i.e., EPB, shield tunneling. Herein, a quality index is proposed to fill this gap and formulated as an L2 norm of a vector composed of three components, i.e., accuracy, inclusiveness, and informativeness. The accuracy component is the ratio of non-outlier samples, i.e., a dataset containing fewer outliers shows a higher accuracy, reflecting the extent to which the dataset represents the real construction conditions during the tunneling. The inclusiveness component is the normalized envelope area of the dataset being mapped into a two-dimensional space, reflecting the range of diverse construction scenarios that have been included in the dataset. The informativeness component is the dimensionless uncertainty reduction of given data-driven models by the dataset, reflecting the contribution of datasets to the given model’s prediction. The proposed quality index is comprehensively assessed using a big database collected from multiple tunneling projects. A series of sub-datasets deliberately divided from the big database are utilized to train data-driven models by three commonly used algorithms, i.e., random forest, neural network, and K-nearest neighbors, for mapping three target functions widely concerned in tunneling, i.e., torque, thrust, and penetration. It is shown that the proposed quality index of the training data unfailingly excellently correlates with the performance of the data-driven models (R-values > 0.91) regardless of algorithms, target functions, and sample sizes.The proposed quality index serves as a theoretical basis for a series of practical application scenarios, e.g., training data selection, and core dataset development. A practical application based on the Changsha project illustrated that the training dataset selected using the quality index can significantly boost the performance of the developed data-driven models by more than 38% and reduce training time by more than 26%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Engineering Applications of Artificial Intelligence
Engineering Applications of Artificial Intelligence 工程技术-工程:电子与电气
CiteScore
9.60
自引率
10.00%
发文量
505
审稿时长
68 days
期刊介绍: Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信