{"title":"Predicting workplace absenteeism using machine learning: a pilot study in occupational health.","authors":"Pablo Llamas Blázquez","doi":"10.1186/s12995-025-00482-5","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Workplace absenteeism represents a significant challenge for organizations and occupational health practitioners, with substantial implications for productivity, healthcare costs, and employee well-being. Traditional approaches to absenteeism management remain largely reactive, highlighting the need for predictive models that enable proactive interventions.</p><p><strong>Objective: </strong>To develop and validate machine learning models for predicting workplace absenteeism patterns and identifying risk factors associated with prolonged absence in a pilot study framework, thereby demonstrating feasibility for evidence-based occupational health interventions.</p><p><strong>Methods: </strong>This pilot study employed machine learning algorithms on a publicly available workplace absenteeism dataset from a Brazilian company (2007-2010) obtained from the UCI Machine Learning Repository. The dataset comprised 740 instances with 19 variables including demographic characteristics, clinical indicators (BMI, ICD-10 coded absence reasons), and occupational factors. Random Forest and Gradient Boosting algorithms were implemented for both classification of prolonged absences and regression of absence duration. Statistical outliers (> 30 h, 3.8% of cases) were excluded to focus on typical absence patterns.</p><p><strong>Results: </strong>The developed models demonstrated feasibility for workplace absenteeism prediction within this pilot framework. The Random Forest classification model achieved 84% accuracy (AUC = 0.89) for distinguishing between typical and prolonged absences. For duration prediction of typical absences (≤ 30 h), the Random Forest regression model yielded R² = 0.13, RMSE = 3.93 h, and MAE = 2.37 h. Key predictors included absence reason (ICD-10 classification), body mass index, and workload metrics, with notable interactions between workload intensity and specific absence categories.</p><p><strong>Conclusions: </strong>This pilot study demonstrates the feasibility of machine learning approaches for occupational health management by enabling identification of employees at risk for prolonged absenteeism. While showing promise for supporting personalized health interventions and resource allocation, implementation requires external validation across multiple organizations and careful consideration of ethical implications regarding employee privacy and algorithmic fairness.</p>","PeriodicalId":48903,"journal":{"name":"Journal of Occupational Medicine and Toxicology","volume":"20 1","pages":"38"},"PeriodicalIF":2.7000,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12604190/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Occupational Medicine and Toxicology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12995-025-00482-5","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Workplace absenteeism represents a significant challenge for organizations and occupational health practitioners, with substantial implications for productivity, healthcare costs, and employee well-being. Traditional approaches to absenteeism management remain largely reactive, highlighting the need for predictive models that enable proactive interventions.
Objective: To develop and validate machine learning models for predicting workplace absenteeism patterns and identifying risk factors associated with prolonged absence in a pilot study framework, thereby demonstrating feasibility for evidence-based occupational health interventions.
Methods: This pilot study employed machine learning algorithms on a publicly available workplace absenteeism dataset from a Brazilian company (2007-2010) obtained from the UCI Machine Learning Repository. The dataset comprised 740 instances with 19 variables including demographic characteristics, clinical indicators (BMI, ICD-10 coded absence reasons), and occupational factors. Random Forest and Gradient Boosting algorithms were implemented for both classification of prolonged absences and regression of absence duration. Statistical outliers (> 30 h, 3.8% of cases) were excluded to focus on typical absence patterns.
Results: The developed models demonstrated feasibility for workplace absenteeism prediction within this pilot framework. The Random Forest classification model achieved 84% accuracy (AUC = 0.89) for distinguishing between typical and prolonged absences. For duration prediction of typical absences (≤ 30 h), the Random Forest regression model yielded R² = 0.13, RMSE = 3.93 h, and MAE = 2.37 h. Key predictors included absence reason (ICD-10 classification), body mass index, and workload metrics, with notable interactions between workload intensity and specific absence categories.
Conclusions: This pilot study demonstrates the feasibility of machine learning approaches for occupational health management by enabling identification of employees at risk for prolonged absenteeism. While showing promise for supporting personalized health interventions and resource allocation, implementation requires external validation across multiple organizations and careful consideration of ethical implications regarding employee privacy and algorithmic fairness.
期刊介绍:
Aimed at clinicians and researchers, the Journal of Occupational Medicine and Toxicology is a multi-disciplinary, open access journal which publishes original research on the clinical and scientific aspects of occupational and environmental health.
With high-quality peer review and quick decision times, we welcome submissions on the diagnosis, prevention, management, and scientific analysis of occupational diseases, injuries, and disability. The journal also covers the promotion of health of workers, their families, and communities, and ranges from rehabilitation to tropical medicine and public health aspects.