对机器学习模型在估算水闸出口冲刷深度时的性能和不确定性进行基准测试

IF 2.2 3区 工程技术 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Xuan-Hien Le, T. H. Le, H. V. Ho, G. Lee
{"title":"对机器学习模型在估算水闸出口冲刷深度时的性能和不确定性进行基准测试","authors":"Xuan-Hien Le, T. H. Le, H. V. Ho, G. Lee","doi":"10.2166/hydro.2024.297","DOIUrl":null,"url":null,"abstract":"\n This study investigates the performance of six machine learning (ML) models – Random Forest (RF), Adaptive Boosting (ADA), CatBoost (CAT), Support Vector Machine (SVM), Lasso Regression (LAS), and Artificial Neural Network (ANN) – against traditional empirical formulas for estimating maximum scour depth after sluice gates. Our findings indicate that ML models generally outperform empirical formulas, with correlation coefficients (CORR) ranging from 0.882 to 0.944 for ML models compared with 0.835–0.847 for empirical methods. Notably, ANN exhibited the highest performance, followed closely by CAT, with a CORR of 0.936. RF, ADA, and SVM performed competitive metrics around 0.928. Variable importance assessments highlighted the dimensionless densimetric Froude number (Fd) as significantly influential, particularly in RF, CAT, and LAS models. Furthermore, SHAP value analysis provided insights into each predictor's impact on model outputs. Uncertainty assessment through Monte Carlo (MC) and Bootstrap (BS) methods, with 1,000 iterations, indicated ML's capability to produce reliable uncertainty maps. ANN leads in performance with higher mean values and lower standard deviations, followed by CAT. MC results trend towards optimistic predictions compared with BS, as reflected in median values and interquartile ranges. This analysis underscores the efficacy of ML models in providing precise and reliable scour depth predictions.","PeriodicalId":54801,"journal":{"name":"Journal of Hydroinformatics","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Benchmarking the performance and uncertainty of machine learning models in estimating scour depth at sluice outlets\",\"authors\":\"Xuan-Hien Le, T. H. Le, H. V. Ho, G. Lee\",\"doi\":\"10.2166/hydro.2024.297\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n This study investigates the performance of six machine learning (ML) models – Random Forest (RF), Adaptive Boosting (ADA), CatBoost (CAT), Support Vector Machine (SVM), Lasso Regression (LAS), and Artificial Neural Network (ANN) – against traditional empirical formulas for estimating maximum scour depth after sluice gates. Our findings indicate that ML models generally outperform empirical formulas, with correlation coefficients (CORR) ranging from 0.882 to 0.944 for ML models compared with 0.835–0.847 for empirical methods. Notably, ANN exhibited the highest performance, followed closely by CAT, with a CORR of 0.936. RF, ADA, and SVM performed competitive metrics around 0.928. Variable importance assessments highlighted the dimensionless densimetric Froude number (Fd) as significantly influential, particularly in RF, CAT, and LAS models. Furthermore, SHAP value analysis provided insights into each predictor's impact on model outputs. Uncertainty assessment through Monte Carlo (MC) and Bootstrap (BS) methods, with 1,000 iterations, indicated ML's capability to produce reliable uncertainty maps. ANN leads in performance with higher mean values and lower standard deviations, followed by CAT. MC results trend towards optimistic predictions compared with BS, as reflected in median values and interquartile ranges. This analysis underscores the efficacy of ML models in providing precise and reliable scour depth predictions.\",\"PeriodicalId\":54801,\"journal\":{\"name\":\"Journal of Hydroinformatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Hydroinformatics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.2166/hydro.2024.297\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hydroinformatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.2166/hydro.2024.297","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

本研究调查了六种机器学习(ML)模型--随机森林(RF)、自适应提升(ADA)、CatBoost(CAT)、支持向量机(SVM)、套索回归(LAS)和人工神经网络(ANN)--在估算水闸后最大冲刷深度时与传统经验公式的性能比较。我们的研究结果表明,ML 模型普遍优于经验公式,ML 模型的相关系数(CORR)为 0.882 至 0.944,而经验方法的相关系数(CORR)为 0.835 至 0.847。值得注意的是,ANN 的性能最高,CAT 紧随其后,CORR 为 0.936。RF、ADA 和 SVM 的性能指标在 0.928 左右,具有竞争力。变量重要性评估强调了无量纲密度测量的弗劳德数(Fd)的重要影响,尤其是在 RF、CAT 和 LAS 模型中。此外,SHAP 值分析有助于深入了解每个预测因子对模型输出的影响。通过蒙特卡罗(MC)和 Bootstrap(BS)方法(迭代 1,000 次)进行的不确定性评估表明,ML 有能力生成可靠的不确定性图。ANN 以较高的平均值和较低的标准偏差在性能方面遥遥领先,CAT 紧随其后。与 BS 相比,MC 结果趋于乐观预测,这反映在中值和四分位数间范围上。这项分析强调了 ML 模型在提供精确可靠的冲刷深度预测方面的功效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Benchmarking the performance and uncertainty of machine learning models in estimating scour depth at sluice outlets
This study investigates the performance of six machine learning (ML) models – Random Forest (RF), Adaptive Boosting (ADA), CatBoost (CAT), Support Vector Machine (SVM), Lasso Regression (LAS), and Artificial Neural Network (ANN) – against traditional empirical formulas for estimating maximum scour depth after sluice gates. Our findings indicate that ML models generally outperform empirical formulas, with correlation coefficients (CORR) ranging from 0.882 to 0.944 for ML models compared with 0.835–0.847 for empirical methods. Notably, ANN exhibited the highest performance, followed closely by CAT, with a CORR of 0.936. RF, ADA, and SVM performed competitive metrics around 0.928. Variable importance assessments highlighted the dimensionless densimetric Froude number (Fd) as significantly influential, particularly in RF, CAT, and LAS models. Furthermore, SHAP value analysis provided insights into each predictor's impact on model outputs. Uncertainty assessment through Monte Carlo (MC) and Bootstrap (BS) methods, with 1,000 iterations, indicated ML's capability to produce reliable uncertainty maps. ANN leads in performance with higher mean values and lower standard deviations, followed by CAT. MC results trend towards optimistic predictions compared with BS, as reflected in median values and interquartile ranges. This analysis underscores the efficacy of ML models in providing precise and reliable scour depth predictions.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Hydroinformatics
Journal of Hydroinformatics 工程技术-工程:土木
CiteScore
4.80
自引率
3.70%
发文量
59
审稿时长
3 months
期刊介绍: Journal of Hydroinformatics is a peer-reviewed journal devoted to the application of information technology in the widest sense to problems of the aquatic environment. It promotes Hydroinformatics as a cross-disciplinary field of study, combining technological, human-sociological and more general environmental interests, including an ethical perspective.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信