Identifying most important predictors for suicidal thoughts and behaviours among healthcare workers active during the Spain COVID-19 pandemic: a machine-learning approach.
Itxaso Alayo, Oriol Pujol, Jordi Alonso, Montse Ferrer, Franco Amigo, Ana Portillo-Van Diest, Enric Aragonès, Andrés Aragon Peña, Ángel Asúnsolo Del Barco, Mireia Campos, Meritxell Espuga, Ana González-Pinto, Josep Maria Haro, Nieves López-Fresneña, Alma D Martínez de Salázar, Juan D Molina, Rafael M Ortí-Lucas, Mara Parellada, José Maria Pelayo-Terán, Maria João Forjaz, Aurora Pérez-Zapata, José Ignacio Pijoan, Nieves Plana, Elena Polentinos-Castro, Maria Teresa Puig, Cristina Rius, Ferran Sanz, Cònsol Serra, Iratxe Urreta-Barallobre, Ronny Bruffaerts, Eduard Vieta, Víctor Pérez-Solá, Philippe Mortier, Gemma Vilagut
{"title":"Identifying most important predictors for suicidal thoughts and behaviours among healthcare workers active during the Spain COVID-19 pandemic: a machine-learning approach.","authors":"Itxaso Alayo, Oriol Pujol, Jordi Alonso, Montse Ferrer, Franco Amigo, Ana Portillo-Van Diest, Enric Aragonès, Andrés Aragon Peña, Ángel Asúnsolo Del Barco, Mireia Campos, Meritxell Espuga, Ana González-Pinto, Josep Maria Haro, Nieves López-Fresneña, Alma D Martínez de Salázar, Juan D Molina, Rafael M Ortí-Lucas, Mara Parellada, José Maria Pelayo-Terán, Maria João Forjaz, Aurora Pérez-Zapata, José Ignacio Pijoan, Nieves Plana, Elena Polentinos-Castro, Maria Teresa Puig, Cristina Rius, Ferran Sanz, Cònsol Serra, Iratxe Urreta-Barallobre, Ronny Bruffaerts, Eduard Vieta, Víctor Pérez-Solá, Philippe Mortier, Gemma Vilagut","doi":"10.1017/S2045796025000198","DOIUrl":null,"url":null,"abstract":"<p><strong>Aims: </strong>Studies conducted during the COVID-19 pandemic found high occurrence of suicidal thoughts and behaviours (STBs) among healthcare workers (HCWs). The current study aimed to (1) develop a machine learning-based prediction model for future STBs using data from a large prospective cohort of Spanish HCWs and (2) identify the most important variables in terms of contribution to the model's predictive accuracy.</p><p><strong>Methods: </strong>This is a prospective, multicentre cohort study of Spanish HCWs active during the COVID-19 pandemic. A total of 8,996 HCWs participated in the web-based baseline survey (May-July 2020) and 4,809 in the 4-month follow-up survey. A total of 219 predictor variables were derived from the baseline survey. The outcome variable was any STB at the 4-month follow-up. Variable selection was done using an L1 regularized linear Support Vector Classifier (SVC). A random forest model with 5-fold cross-validation was developed, in which the Synthetic Minority Oversampling Technique (SMOTE) and undersampling of the majority class balancing techniques were tested. The model was evaluated by the area under the Receiver Operating Characteristic (AUROC) curve and the area under the precision-recall curve. Shapley's additive explanatory values (SHAP values) were used to evaluate the overall contribution of each variable to the prediction of future STBs. Results were obtained separately by gender.</p><p><strong>Results: </strong>The prevalence of STBs in HCWs at the 4-month follow-up was 7.9% (women = 7.8%, men = 8.2%). Thirty-four variables were selected by the L1 regularized linear SVC. The best results were obtained without data balancing techniques: AUROC = 0.87 (0.86 for women and 0.87 for men) and area under the precision-recall curve = 0.50 (0.55 for women and 0.45 for men). Based on SHAP values, the most important baseline predictors for any STB at the 4-month follow-up were the presence of passive suicidal ideation, the number of days in the past 30 days with passive or active suicidal ideation, the number of days in the past 30 days with binge eating episodes, the number of panic attacks (women only) and the frequency of intrusive thoughts (men only).</p><p><strong>Conclusions: </strong>Machine learning-based prediction models for STBs in HCWs during the COVID-19 pandemic trained on web-based survey data present high discrimination and classification capacity. Future clinical implementations of this model could enable the early detection of HCWs at the highest risk for developing adverse mental health outcomes.</p><p><strong>Study registration: </strong>NCT04556565.</p>","PeriodicalId":11787,"journal":{"name":"Epidemiology and Psychiatric Sciences","volume":"34 ","pages":"e28"},"PeriodicalIF":5.9000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12090031/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epidemiology and Psychiatric Sciences","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1017/S2045796025000198","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0
Abstract
Aims: Studies conducted during the COVID-19 pandemic found high occurrence of suicidal thoughts and behaviours (STBs) among healthcare workers (HCWs). The current study aimed to (1) develop a machine learning-based prediction model for future STBs using data from a large prospective cohort of Spanish HCWs and (2) identify the most important variables in terms of contribution to the model's predictive accuracy.
Methods: This is a prospective, multicentre cohort study of Spanish HCWs active during the COVID-19 pandemic. A total of 8,996 HCWs participated in the web-based baseline survey (May-July 2020) and 4,809 in the 4-month follow-up survey. A total of 219 predictor variables were derived from the baseline survey. The outcome variable was any STB at the 4-month follow-up. Variable selection was done using an L1 regularized linear Support Vector Classifier (SVC). A random forest model with 5-fold cross-validation was developed, in which the Synthetic Minority Oversampling Technique (SMOTE) and undersampling of the majority class balancing techniques were tested. The model was evaluated by the area under the Receiver Operating Characteristic (AUROC) curve and the area under the precision-recall curve. Shapley's additive explanatory values (SHAP values) were used to evaluate the overall contribution of each variable to the prediction of future STBs. Results were obtained separately by gender.
Results: The prevalence of STBs in HCWs at the 4-month follow-up was 7.9% (women = 7.8%, men = 8.2%). Thirty-four variables were selected by the L1 regularized linear SVC. The best results were obtained without data balancing techniques: AUROC = 0.87 (0.86 for women and 0.87 for men) and area under the precision-recall curve = 0.50 (0.55 for women and 0.45 for men). Based on SHAP values, the most important baseline predictors for any STB at the 4-month follow-up were the presence of passive suicidal ideation, the number of days in the past 30 days with passive or active suicidal ideation, the number of days in the past 30 days with binge eating episodes, the number of panic attacks (women only) and the frequency of intrusive thoughts (men only).
Conclusions: Machine learning-based prediction models for STBs in HCWs during the COVID-19 pandemic trained on web-based survey data present high discrimination and classification capacity. Future clinical implementations of this model could enable the early detection of HCWs at the highest risk for developing adverse mental health outcomes.
期刊介绍:
Epidemiology and Psychiatric Sciences is a prestigious international, peer-reviewed journal that has been publishing in Open Access format since 2020. Formerly known as Epidemiologia e Psichiatria Sociale and established in 1992 by Michele Tansella, the journal prioritizes highly relevant and innovative research articles and systematic reviews in the areas of public mental health and policy, mental health services and system research, as well as epidemiological and social psychiatry. Join us in advancing knowledge and understanding in these critical fields.