Predicting workplace absenteeism using machine learning: a pilot study in occupational health.

IF 2.7 4区 医学 Q2 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
Pablo Llamas Blázquez
{"title":"Predicting workplace absenteeism using machine learning: a pilot study in occupational health.","authors":"Pablo Llamas Blázquez","doi":"10.1186/s12995-025-00482-5","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Workplace absenteeism represents a significant challenge for organizations and occupational health practitioners, with substantial implications for productivity, healthcare costs, and employee well-being. Traditional approaches to absenteeism management remain largely reactive, highlighting the need for predictive models that enable proactive interventions.</p><p><strong>Objective: </strong>To develop and validate machine learning models for predicting workplace absenteeism patterns and identifying risk factors associated with prolonged absence in a pilot study framework, thereby demonstrating feasibility for evidence-based occupational health interventions.</p><p><strong>Methods: </strong>This pilot study employed machine learning algorithms on a publicly available workplace absenteeism dataset from a Brazilian company (2007-2010) obtained from the UCI Machine Learning Repository. The dataset comprised 740 instances with 19 variables including demographic characteristics, clinical indicators (BMI, ICD-10 coded absence reasons), and occupational factors. Random Forest and Gradient Boosting algorithms were implemented for both classification of prolonged absences and regression of absence duration. Statistical outliers (> 30 h, 3.8% of cases) were excluded to focus on typical absence patterns.</p><p><strong>Results: </strong>The developed models demonstrated feasibility for workplace absenteeism prediction within this pilot framework. The Random Forest classification model achieved 84% accuracy (AUC = 0.89) for distinguishing between typical and prolonged absences. For duration prediction of typical absences (≤ 30 h), the Random Forest regression model yielded R² = 0.13, RMSE = 3.93 h, and MAE = 2.37 h. Key predictors included absence reason (ICD-10 classification), body mass index, and workload metrics, with notable interactions between workload intensity and specific absence categories.</p><p><strong>Conclusions: </strong>This pilot study demonstrates the feasibility of machine learning approaches for occupational health management by enabling identification of employees at risk for prolonged absenteeism. While showing promise for supporting personalized health interventions and resource allocation, implementation requires external validation across multiple organizations and careful consideration of ethical implications regarding employee privacy and algorithmic fairness.</p>","PeriodicalId":48903,"journal":{"name":"Journal of Occupational Medicine and Toxicology","volume":"20 1","pages":"38"},"PeriodicalIF":2.7000,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12604190/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Occupational Medicine and Toxicology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12995-025-00482-5","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Workplace absenteeism represents a significant challenge for organizations and occupational health practitioners, with substantial implications for productivity, healthcare costs, and employee well-being. Traditional approaches to absenteeism management remain largely reactive, highlighting the need for predictive models that enable proactive interventions.

Objective: To develop and validate machine learning models for predicting workplace absenteeism patterns and identifying risk factors associated with prolonged absence in a pilot study framework, thereby demonstrating feasibility for evidence-based occupational health interventions.

Methods: This pilot study employed machine learning algorithms on a publicly available workplace absenteeism dataset from a Brazilian company (2007-2010) obtained from the UCI Machine Learning Repository. The dataset comprised 740 instances with 19 variables including demographic characteristics, clinical indicators (BMI, ICD-10 coded absence reasons), and occupational factors. Random Forest and Gradient Boosting algorithms were implemented for both classification of prolonged absences and regression of absence duration. Statistical outliers (> 30 h, 3.8% of cases) were excluded to focus on typical absence patterns.

Results: The developed models demonstrated feasibility for workplace absenteeism prediction within this pilot framework. The Random Forest classification model achieved 84% accuracy (AUC = 0.89) for distinguishing between typical and prolonged absences. For duration prediction of typical absences (≤ 30 h), the Random Forest regression model yielded R² = 0.13, RMSE = 3.93 h, and MAE = 2.37 h. Key predictors included absence reason (ICD-10 classification), body mass index, and workload metrics, with notable interactions between workload intensity and specific absence categories.

Conclusions: This pilot study demonstrates the feasibility of machine learning approaches for occupational health management by enabling identification of employees at risk for prolonged absenteeism. While showing promise for supporting personalized health interventions and resource allocation, implementation requires external validation across multiple organizations and careful consideration of ethical implications regarding employee privacy and algorithmic fairness.

使用机器学习预测工作场所缺勤:一项关于职业健康的试点研究。
背景:工作场所缺勤对组织和职业健康从业者来说是一个重大挑战,对生产力、医疗成本和员工福利都有重大影响。传统的缺勤管理方法在很大程度上仍然是被动的,这突出了对能够进行主动干预的预测模型的需求。目的:在试点研究框架中开发并验证用于预测工作场所缺勤模式和识别与长期缺勤相关的风险因素的机器学习模型,从而证明基于证据的职业健康干预措施的可行性。方法:本试点研究采用机器学习算法对从UCI机器学习存储库获得的巴西公司(2007-2010)公开可用的工作场所缺勤数据集进行处理。该数据集包括740个实例,包含19个变量,包括人口统计学特征、临床指标(BMI、ICD-10编码缺勤原因)和职业因素。采用随机森林算法和梯度增强算法对长时间缺勤进行分类,并对缺勤时间进行回归。排除统计异常值(bbb30 h, 3.8%的病例),重点关注典型的缺席模式。结果:开发的模型证明了在该试点框架下工作场所缺勤预测的可行性。随机森林分类模型在区分典型缺席和长时间缺席方面达到了84%的准确率(AUC = 0.89)。对于典型缺勤时间(≤30 h)的预测,随机森林回归模型的预测结果为R²= 0.13,RMSE = 3.93 h, MAE = 2.37 h。主要预测因子包括缺勤原因(ICD-10分类)、体重指数和工作量指标,工作强度与具体缺勤类别之间存在显著的交互作用。结论:本试点研究通过识别长期旷工风险的员工,证明了机器学习方法用于职业健康管理的可行性。虽然有望支持个性化健康干预和资源分配,但实施需要跨多个组织进行外部验证,并仔细考虑员工隐私和算法公平性方面的道德影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Occupational Medicine and Toxicology
Journal of Occupational Medicine and Toxicology PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH-
CiteScore
6.00
自引率
0.00%
发文量
23
审稿时长
19 weeks
期刊介绍: Aimed at clinicians and researchers, the Journal of Occupational Medicine and Toxicology is a multi-disciplinary, open access journal which publishes original research on the clinical and scientific aspects of occupational and environmental health. With high-quality peer review and quick decision times, we welcome submissions on the diagnosis, prevention, management, and scientific analysis of occupational diseases, injuries, and disability. The journal also covers the promotion of health of workers, their families, and communities, and ranges from rehabilitation to tropical medicine and public health aspects.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书