[基于机器学习的煤矿工人肺功能异常风险评估模型的开发与验证]。

Q3 Medicine
Y X Zhu, K Y Guo, C Yang, Y X Zhang, H Zhu, Y L Jin
{"title":"[基于机器学习的煤矿工人肺功能异常风险评估模型的开发与验证]。","authors":"Y X Zhu, K Y Guo, C Yang, Y X Zhang, H Zhu, Y L Jin","doi":"10.3760/cma.j.cn121094-20240328-00127","DOIUrl":null,"url":null,"abstract":"<p><p><b>Objective:</b> To analyze the factors influencing the lung function of coal miners, identify the optimal combination of indicators for evaluating lung function, develop a risk assessment model using machine learning, and offer personalized risk assessment for workers. <b>Methods:</b> In June 2023, through cluster sampling, male underground workers who participated in occupational health examinations at a coal mine in North China from July to August 2018 were selected as the research subjects. Their health examination results and occupational environmental data were collected. A total of 3, 320 coal miners were included. Randomly divide the research subjects into a training set (2324 people) and a validation set (996 people) in a ratio of 7∶3, and the balance of the two sets was tested. Perform LASSO regression analysis using R 4.2.2 software to select relevant important variables, and determine the model's input variables by combining them with relevant literature. Utilize Python 3.8 to construct logistic regression, random forest, support vector machine, and XG Boost models, assess the models' discriminative ability using metrics like accuracy, sensitivity, specificity, F1 score, ROC curve, and AUC, evaluate the models' calibration using Brier score, Log loss score, and calibration curve, and further analyze the clinical performance of the developed models through DCA decision curve analysis. <b>Results:</b> Among the 3 320 coal miners, 856 had abnormal lung function (25.78%). The XG Boost model was identified as the optimal model, achieving a training set accuracy of 87.39%, sensitivity of 86.60%, specificity of 87.67%, F1 score of 0.779, AUC of 0.945, Brier score of 0.071, Log loss of 0.267 and demonstrated good calibration curve consistency. <b>Conclusion:</b> The XG Boost model exhibits superior predictive performance compared to other models, and the model has high application value. The Shapley Additive Explanation (SHAP) method is employed for interpretation, making it a reliable basis for preventing abnormal lung function in coal miners.</p>","PeriodicalId":23958,"journal":{"name":"中华劳动卫生职业病杂志","volume":"43 5","pages":"332-337"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"[Development and validation of risk assessment models for abnormal lung function in coal workers based on machine learning].\",\"authors\":\"Y X Zhu, K Y Guo, C Yang, Y X Zhang, H Zhu, Y L Jin\",\"doi\":\"10.3760/cma.j.cn121094-20240328-00127\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Objective:</b> To analyze the factors influencing the lung function of coal miners, identify the optimal combination of indicators for evaluating lung function, develop a risk assessment model using machine learning, and offer personalized risk assessment for workers. <b>Methods:</b> In June 2023, through cluster sampling, male underground workers who participated in occupational health examinations at a coal mine in North China from July to August 2018 were selected as the research subjects. Their health examination results and occupational environmental data were collected. A total of 3, 320 coal miners were included. Randomly divide the research subjects into a training set (2324 people) and a validation set (996 people) in a ratio of 7∶3, and the balance of the two sets was tested. Perform LASSO regression analysis using R 4.2.2 software to select relevant important variables, and determine the model's input variables by combining them with relevant literature. Utilize Python 3.8 to construct logistic regression, random forest, support vector machine, and XG Boost models, assess the models' discriminative ability using metrics like accuracy, sensitivity, specificity, F1 score, ROC curve, and AUC, evaluate the models' calibration using Brier score, Log loss score, and calibration curve, and further analyze the clinical performance of the developed models through DCA decision curve analysis. <b>Results:</b> Among the 3 320 coal miners, 856 had abnormal lung function (25.78%). The XG Boost model was identified as the optimal model, achieving a training set accuracy of 87.39%, sensitivity of 86.60%, specificity of 87.67%, F1 score of 0.779, AUC of 0.945, Brier score of 0.071, Log loss of 0.267 and demonstrated good calibration curve consistency. <b>Conclusion:</b> The XG Boost model exhibits superior predictive performance compared to other models, and the model has high application value. The Shapley Additive Explanation (SHAP) method is employed for interpretation, making it a reliable basis for preventing abnormal lung function in coal miners.</p>\",\"PeriodicalId\":23958,\"journal\":{\"name\":\"中华劳动卫生职业病杂志\",\"volume\":\"43 5\",\"pages\":\"332-337\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"中华劳动卫生职业病杂志\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3760/cma.j.cn121094-20240328-00127\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"中华劳动卫生职业病杂志","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3760/cma.j.cn121094-20240328-00127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

摘要

目的:分析煤矿工人肺功能的影响因素,确定肺功能评价指标的最优组合,利用机器学习技术建立风险评估模型,为煤矿工人提供个性化的风险评估。方法:于2023年6月,采用整群抽样的方法,选取2018年7 - 8月在华北某煤矿参加职业健康检查的男性井下工人作为研究对象。收集他们的健康检查结果和职业环境资料。共有3,320名煤矿工人被纳入其中。将研究对象按7∶3的比例随机分为训练集(2324人)和验证集(996人),并对两组的平衡性进行测试。利用r4.2.2软件进行LASSO回归分析,选取相关重要变量,结合相关文献确定模型的输入变量。利用Python 3.8构建logistic回归、随机森林、支持向量机、XG Boost模型,通过准确性、灵敏度、特异性、F1评分、ROC曲线、AUC等指标评价模型的判别能力,通过Brier评分、Log loss评分、校准曲线评价模型的校准性,并通过DCA决策曲线分析进一步分析所建模型的临床表现。结果:3 320名煤矿工人中,肺功能异常856人(25.78%)。XG Boost模型为最优模型,训练集准确率为87.39%,灵敏度为86.60%,特异性为87.67%,F1评分为0.779,AUC为0.945,Brier评分为0.071,Log loss为0.267,校准曲线一致性较好。结论:XG Boost模型的预测性能优于其他模型,具有较高的应用价值。采用Shapley加性解释(SHAP)方法进行解释,为预防煤矿工人肺功能异常提供了可靠依据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
[Development and validation of risk assessment models for abnormal lung function in coal workers based on machine learning].

Objective: To analyze the factors influencing the lung function of coal miners, identify the optimal combination of indicators for evaluating lung function, develop a risk assessment model using machine learning, and offer personalized risk assessment for workers. Methods: In June 2023, through cluster sampling, male underground workers who participated in occupational health examinations at a coal mine in North China from July to August 2018 were selected as the research subjects. Their health examination results and occupational environmental data were collected. A total of 3, 320 coal miners were included. Randomly divide the research subjects into a training set (2324 people) and a validation set (996 people) in a ratio of 7∶3, and the balance of the two sets was tested. Perform LASSO regression analysis using R 4.2.2 software to select relevant important variables, and determine the model's input variables by combining them with relevant literature. Utilize Python 3.8 to construct logistic regression, random forest, support vector machine, and XG Boost models, assess the models' discriminative ability using metrics like accuracy, sensitivity, specificity, F1 score, ROC curve, and AUC, evaluate the models' calibration using Brier score, Log loss score, and calibration curve, and further analyze the clinical performance of the developed models through DCA decision curve analysis. Results: Among the 3 320 coal miners, 856 had abnormal lung function (25.78%). The XG Boost model was identified as the optimal model, achieving a training set accuracy of 87.39%, sensitivity of 86.60%, specificity of 87.67%, F1 score of 0.779, AUC of 0.945, Brier score of 0.071, Log loss of 0.267 and demonstrated good calibration curve consistency. Conclusion: The XG Boost model exhibits superior predictive performance compared to other models, and the model has high application value. The Shapley Additive Explanation (SHAP) method is employed for interpretation, making it a reliable basis for preventing abnormal lung function in coal miners.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
中华劳动卫生职业病杂志
中华劳动卫生职业病杂志 Medicine-Medicine (all)
CiteScore
1.00
自引率
0.00%
发文量
9764
期刊介绍:
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信