基于GBDT-RS-SHAP的矿井突水源识别

IF 2.8 4区 环境科学与生态学 Q3 ENVIRONMENTAL SCIENCES
Zhenwei Yang, Han Li, Xinyi Wang, Hongwei Meng, Tong Xi, Zhenhuan Hou
{"title":"基于GBDT-RS-SHAP的矿井突水源识别","authors":"Zhenwei Yang,&nbsp;Han Li,&nbsp;Xinyi Wang,&nbsp;Hongwei Meng,&nbsp;Tong Xi,&nbsp;Zhenhuan Hou","doi":"10.1007/s12665-025-12107-5","DOIUrl":null,"url":null,"abstract":"<div><p>A novel interpretable intelligent water source identification model, integrating gradient boosting decision trees (GBDT) with SHapley Additive exPlanations (SHAP), has been developed to enhance safety in coal mining operations. To mitigate the impact of outliers on model accuracy during training, box plots and multivariate distribution matrix plots were employed to detect and subsequently remove outlier data from the sample. The processed dataset was subsequently subjected to training via GBDT, culminating in the development of a definitive classification model predicated on the gradient of residuals. The model’s hyperparameters, encompassing the number of trees, tree depth, and learning rate, were meticulously optimized through a random search algorithm to augment the model’s predictive performance. Utilizing the measured data from water samples collected in the Pingdingshan Coalfield, cross-validation was performed, yielding a maximum precision of 0.857 and an average precision of 0.602. Upon the application of the optimized GBDT model to the classification of 24 unknown water samples, the model achieved a high accuracy rate of 95.8%, with a single misclassification, and a minimal root mean square error (RMSE) of 0.183. This demonstrates that stochastic search optimization enhances the model’s stability and robustness, addressing the challenges of inefficiency and inaccuracy in coal mine water source identification, and significantly contributes to the advancement of water hazard prevention and control measures in coal mining. To make the output of the model transparent, this study employs SHAP for the elucidation of the model’s output. SHAP is a Python-based “Model Interpretation” package designed to elucidate the predictions of machine learning models. The findings reveal that fluctuations in Ca<sup>2+</sup> concentration exert a substantial impact on the discrimination outcomes, whereas the characteristic contribution of SO<sub>4</sub><sup>2−</sup> is negligible and can be disregarded. This offers a foundational and referential framework for the study of water sources for mine water emergencies.</p></div>","PeriodicalId":542,"journal":{"name":"Environmental Earth Sciences","volume":"84 4","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Source identification of mine water inrush based on GBDT-RS-SHAP\",\"authors\":\"Zhenwei Yang,&nbsp;Han Li,&nbsp;Xinyi Wang,&nbsp;Hongwei Meng,&nbsp;Tong Xi,&nbsp;Zhenhuan Hou\",\"doi\":\"10.1007/s12665-025-12107-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>A novel interpretable intelligent water source identification model, integrating gradient boosting decision trees (GBDT) with SHapley Additive exPlanations (SHAP), has been developed to enhance safety in coal mining operations. To mitigate the impact of outliers on model accuracy during training, box plots and multivariate distribution matrix plots were employed to detect and subsequently remove outlier data from the sample. The processed dataset was subsequently subjected to training via GBDT, culminating in the development of a definitive classification model predicated on the gradient of residuals. The model’s hyperparameters, encompassing the number of trees, tree depth, and learning rate, were meticulously optimized through a random search algorithm to augment the model’s predictive performance. Utilizing the measured data from water samples collected in the Pingdingshan Coalfield, cross-validation was performed, yielding a maximum precision of 0.857 and an average precision of 0.602. Upon the application of the optimized GBDT model to the classification of 24 unknown water samples, the model achieved a high accuracy rate of 95.8%, with a single misclassification, and a minimal root mean square error (RMSE) of 0.183. This demonstrates that stochastic search optimization enhances the model’s stability and robustness, addressing the challenges of inefficiency and inaccuracy in coal mine water source identification, and significantly contributes to the advancement of water hazard prevention and control measures in coal mining. To make the output of the model transparent, this study employs SHAP for the elucidation of the model’s output. SHAP is a Python-based “Model Interpretation” package designed to elucidate the predictions of machine learning models. The findings reveal that fluctuations in Ca<sup>2+</sup> concentration exert a substantial impact on the discrimination outcomes, whereas the characteristic contribution of SO<sub>4</sub><sup>2−</sup> is negligible and can be disregarded. This offers a foundational and referential framework for the study of water sources for mine water emergencies.</p></div>\",\"PeriodicalId\":542,\"journal\":{\"name\":\"Environmental Earth Sciences\",\"volume\":\"84 4\",\"pages\":\"\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-02-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmental Earth Sciences\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s12665-025-12107-5\",\"RegionNum\":4,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Earth Sciences","FirstCategoryId":"93","ListUrlMain":"https://link.springer.com/article/10.1007/s12665-025-12107-5","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

基于梯度增强决策树(GBDT)和SHapley加性解释(SHAP),提出了一种新的可解释智能水源识别模型,以提高煤矿开采的安全性。为了减轻训练过程中异常值对模型准确性的影响,采用箱形图和多元分布矩阵图来检测并随后从样本中去除异常值数据。处理后的数据集随后通过GBDT进行训练,最终开发出基于残差梯度的最终分类模型。模型的超参数,包括树的数量、树的深度和学习率,通过随机搜索算法进行了精心优化,以增强模型的预测性能。利用平顶山煤田水样实测数据进行交叉验证,最大精度为0.857,平均精度为0.602。将优化后的GBDT模型应用到24个未知水样的分类中,模型准确率达到95.8%,仅有一次误分类,均方根误差(RMSE)最小为0.183。这表明,随机搜索优化增强了模型的稳定性和鲁棒性,解决了煤矿水源识别低效和不准确的挑战,对煤矿水害防治措施的推进具有重要意义。为了使模型的输出透明,本研究采用SHAP对模型的输出进行说明。SHAP是一个基于python的“模型解释”包,旨在阐明机器学习模型的预测。研究结果表明,Ca2+浓度的波动对识别结果有实质性影响,而SO42−的特征贡献可以忽略不计。这为矿井突发水事件的水源研究提供了基础和参考框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Source identification of mine water inrush based on GBDT-RS-SHAP

A novel interpretable intelligent water source identification model, integrating gradient boosting decision trees (GBDT) with SHapley Additive exPlanations (SHAP), has been developed to enhance safety in coal mining operations. To mitigate the impact of outliers on model accuracy during training, box plots and multivariate distribution matrix plots were employed to detect and subsequently remove outlier data from the sample. The processed dataset was subsequently subjected to training via GBDT, culminating in the development of a definitive classification model predicated on the gradient of residuals. The model’s hyperparameters, encompassing the number of trees, tree depth, and learning rate, were meticulously optimized through a random search algorithm to augment the model’s predictive performance. Utilizing the measured data from water samples collected in the Pingdingshan Coalfield, cross-validation was performed, yielding a maximum precision of 0.857 and an average precision of 0.602. Upon the application of the optimized GBDT model to the classification of 24 unknown water samples, the model achieved a high accuracy rate of 95.8%, with a single misclassification, and a minimal root mean square error (RMSE) of 0.183. This demonstrates that stochastic search optimization enhances the model’s stability and robustness, addressing the challenges of inefficiency and inaccuracy in coal mine water source identification, and significantly contributes to the advancement of water hazard prevention and control measures in coal mining. To make the output of the model transparent, this study employs SHAP for the elucidation of the model’s output. SHAP is a Python-based “Model Interpretation” package designed to elucidate the predictions of machine learning models. The findings reveal that fluctuations in Ca2+ concentration exert a substantial impact on the discrimination outcomes, whereas the characteristic contribution of SO42− is negligible and can be disregarded. This offers a foundational and referential framework for the study of water sources for mine water emergencies.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Environmental Earth Sciences
Environmental Earth Sciences 环境科学-地球科学综合
CiteScore
5.10
自引率
3.60%
发文量
494
审稿时长
8.3 months
期刊介绍: Environmental Earth Sciences is an international multidisciplinary journal concerned with all aspects of interaction between humans, natural resources, ecosystems, special climates or unique geographic zones, and the earth: Water and soil contamination caused by waste management and disposal practices Environmental problems associated with transportation by land, air, or water Geological processes that may impact biosystems or humans Man-made or naturally occurring geological or hydrological hazards Environmental problems associated with the recovery of materials from the earth Environmental problems caused by extraction of minerals, coal, and ores, as well as oil and gas, water and alternative energy sources Environmental impacts of exploration and recultivation – Environmental impacts of hazardous materials Management of environmental data and information in data banks and information systems Dissemination of knowledge on techniques, methods, approaches and experiences to improve and remediate the environment In pursuit of these topics, the geoscientific disciplines are invited to contribute their knowledge and experience. Major disciplines include: hydrogeology, hydrochemistry, geochemistry, geophysics, engineering geology, remediation science, natural resources management, environmental climatology and biota, environmental geography, soil science and geomicrobiology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信