Corinne Mette, Dorian Verboux, Antoine Rachas, Gonzague Debeugny
{"title":"Predicting the risk of becoming eligible for the disability pension: Machine learning methods applied to French health data","authors":"Corinne Mette, Dorian Verboux, Antoine Rachas, Gonzague Debeugny","doi":"10.3917/spub.236.0065","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Benefiting from the disability pension implies morbid (physical and psychological) and social (fall in income) implications for the person. It also has economic consequences for society, with increasing expenses since 2011 (+4.9% on average per year). Investing in preventive actions against the loss of the ability to work should limit these consequences, but it requires targeting people at risk. The development of artificial intelligence opens up prospects in this regard.</p><p><strong>Purpose of the research: </strong>To target, using supervised machine learning methods, those people with a high probability of becoming eligible for the disability pension over the course of the year based on their socio-demographic and medical characteristics (pathologies, work stoppages, drugs taken, and medical procedures).</p><p><strong>Method: </strong>Among the beneficiaries of the French public welfare system aged 20–64 in 2017, we compared the socio-demographic and medical characteristics between 2014 and 2016 of those who received a disability pension in 2017 and not before, and those who did not receive a disability pension from 2014 to 2017. The determination of the boundary between these two groups was tested using logistic regression, decision trees, random forests, naive Bayes classifiers, and support vector machines. The models’ performance was compared with respect to accuracy, precision, sensitivity, specificity, and AUC (area under the curve). Finally, the predictive power of each factor was measured by AUC too.</p><p><strong>Results: </strong>The boosted logistic regression had the best performance for three of the five criteria, but low sensitivity. The best sensitivity was obtained with the support vector machines, with an accuracy close to that of the boosted logistic regression, but a lower precision and specificity. Random forests offered the best discriminatory ability. The naive Bayes classifier had the worst performance. The most predictive factors in becoming eligible for the disability pension were having 30 days or more off sick in 2014, 2015, and 2016 and being aged 55 to 64.</p><p><strong>Conclusion: </strong>Supervised learning methods have appeared relevant for identifying people with the highest probability of becoming eligible for the disability pension and, more broadly, for steering public and social policies.</p>","PeriodicalId":49575,"journal":{"name":"Sante Publique","volume":null,"pages":null},"PeriodicalIF":0.3000,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sante Publique","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3917/spub.236.0065","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Benefiting from the disability pension implies morbid (physical and psychological) and social (fall in income) implications for the person. It also has economic consequences for society, with increasing expenses since 2011 (+4.9% on average per year). Investing in preventive actions against the loss of the ability to work should limit these consequences, but it requires targeting people at risk. The development of artificial intelligence opens up prospects in this regard.
Purpose of the research: To target, using supervised machine learning methods, those people with a high probability of becoming eligible for the disability pension over the course of the year based on their socio-demographic and medical characteristics (pathologies, work stoppages, drugs taken, and medical procedures).
Method: Among the beneficiaries of the French public welfare system aged 20–64 in 2017, we compared the socio-demographic and medical characteristics between 2014 and 2016 of those who received a disability pension in 2017 and not before, and those who did not receive a disability pension from 2014 to 2017. The determination of the boundary between these two groups was tested using logistic regression, decision trees, random forests, naive Bayes classifiers, and support vector machines. The models’ performance was compared with respect to accuracy, precision, sensitivity, specificity, and AUC (area under the curve). Finally, the predictive power of each factor was measured by AUC too.
Results: The boosted logistic regression had the best performance for three of the five criteria, but low sensitivity. The best sensitivity was obtained with the support vector machines, with an accuracy close to that of the boosted logistic regression, but a lower precision and specificity. Random forests offered the best discriminatory ability. The naive Bayes classifier had the worst performance. The most predictive factors in becoming eligible for the disability pension were having 30 days or more off sick in 2014, 2015, and 2016 and being aged 55 to 64.
Conclusion: Supervised learning methods have appeared relevant for identifying people with the highest probability of becoming eligible for the disability pension and, more broadly, for steering public and social policies.
期刊介绍:
La revue Santé Publique s’adresse à l’ensemble des acteurs de santé publique qu’ils soient décideurs,
professionnels de santé, acteurs de terrain, chercheurs, enseignants ou formateurs, etc. Elle publie
des travaux de recherche, des évaluations, des analyses d’action, des réflexions sur des interventions
de santé, des opinions, relevant des champs de la santé publique et de l’analyse des services de
soins, des sciences sociales et de l’action sociale.
Santé publique est une revue à comité de lecture, multidisciplinaire et généraliste, qui publie sur
l’ensemble des thèmes de la santé publique parmi lesquels : accès et recours aux soins, déterminants
et inégalités sociales de santé, prévention, éducation pour la santé, promotion de la santé,
organisation des soins, environnement, formation des professionnels de santé, nutrition, politiques
de santé, pratiques professionnelles, qualité des soins, gestion des risques sanitaires, représentation
et santé perçue, santé scolaire, santé et travail, systèmes de santé, systèmes d’information, veille
sanitaire, déterminants de la consommation de soins, organisation et économie des différents
secteurs de production de soins (hôpital, médicament, etc.), évaluation médico-économique
d’activités de soins ou de prévention et de programmes de santé, planification des ressources,
politiques de régulation et de financement, etc