Md Abu Sayeed, Azizur Rahman, Atikur Rahman, Rumana Rois
{"title":"On the interpretability of the SVM model for predicting infant mortality in Bangladesh.","authors":"Md Abu Sayeed, Azizur Rahman, Atikur Rahman, Rumana Rois","doi":"10.1186/s41043-024-00646-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Although machine learning (ML) models are well-liked for their outperformance in prediction, greatly avoided due to the lack of intuition and explanation of their predictions. Interpretable ML is, therefore, an emerging research field that combines the performance and interpretability of ML models to create comprehensive solutions for complex decision-making analysis. Conversely, infant mortality is a global public health concern affecting health, social well-being, socio-economic development, and healthcare services. The study employs advanced interpretable ML techniques to anticipate and understand the factors affecting infant mortality in Bangladesh, overcoming the shortcomings of the conventional logistic regression (LR) model.</p><p><strong>Methods: </strong>By utilizing the global surrogate model and local individual conditional expectation (ICE) interpretability technique, the interpretable support vector machine (SVM) has been used in this study to reveal significant characteristics of infant mortality using data from the Bangladesh Demographic and Health Survey (BDHS) 2017-18. To investigate intricate decision-making analysis of infant mortality, we adapted SVM and LR techniques with the hyperparameter tuning parameters. These models' performances were initially assessed using the receiver operating characteristics (ROC) curve, run-time, and confusion matrix parameters with 100 permutations. Afterward, the SVM model's model-agnostic explanation and the LR model's interpretation were compared to enhance advanced comprehension for further insights.</p><p><strong>Results: </strong>The results of the 100 permutations demonstrated that the LR model (Average: accuracy = 0.9105, precision = NaN, sensitivity = 0, specificity = 1, F1-score = 0, area under the ROC curve (AUC) = 0.6780, run-time = 0.0832) outperformed the SVM model (Average: accuracy = 0.8470, precision = 0.1062, sensitivity = 0.0949, specificity = 0.9209, F1-score = 0.1000, AUC = 0.5632, run-time = 0.0254) in predicting infant mortality, but the LR model had a slower run-time and it was unable to predict any positive cases. The interpretation of LR analysis revealed that infant mortality rates decrease when mothers give birth after over two years, with higher educational attainment, overweight or obese mothers, working mothers, and families with polluted cooking fuel having lower rates. The local ICE interpretability technique, which depicts individual influences on the average likelihood of dying within the first birthday, explores the interpretable SVM model that mothers with normal BMIs, giving birth within two years, using less polluted cooking fuel, working mothers, and having male infant were more likely to experience infant death. The interpretable SVM model based on the global surrogate model also reveals that working mothers who used polluted cooking fuel at home and working women who used less polluted cooking fuel but had a longer period between pregnancies than two years would have higher infant death rates. Even among non-working mothers who used polluted cooking fuel and gave birth within two years of the preceding one, infant death rates were higher.</p><p><strong>Conclusions: </strong>The interpretable SVM model reveals global interpretations help clinicians understand the entire conditional distribution, while local interpretations focus on specific instances, providing different insights into model behavior. Interpretable ML models aid policymakers, stakeholders, and families in understanding and preventing infant deaths by improving policy-making strategies and establishing effective family counseling services.</p>","PeriodicalId":15969,"journal":{"name":"Journal of Health, Population, and Nutrition","volume":"43 1","pages":"170"},"PeriodicalIF":2.4000,"publicationDate":"2024-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11520049/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Health, Population, and Nutrition","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s41043-024-00646-9","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Although machine learning (ML) models are well-liked for their outperformance in prediction, greatly avoided due to the lack of intuition and explanation of their predictions. Interpretable ML is, therefore, an emerging research field that combines the performance and interpretability of ML models to create comprehensive solutions for complex decision-making analysis. Conversely, infant mortality is a global public health concern affecting health, social well-being, socio-economic development, and healthcare services. The study employs advanced interpretable ML techniques to anticipate and understand the factors affecting infant mortality in Bangladesh, overcoming the shortcomings of the conventional logistic regression (LR) model.
Methods: By utilizing the global surrogate model and local individual conditional expectation (ICE) interpretability technique, the interpretable support vector machine (SVM) has been used in this study to reveal significant characteristics of infant mortality using data from the Bangladesh Demographic and Health Survey (BDHS) 2017-18. To investigate intricate decision-making analysis of infant mortality, we adapted SVM and LR techniques with the hyperparameter tuning parameters. These models' performances were initially assessed using the receiver operating characteristics (ROC) curve, run-time, and confusion matrix parameters with 100 permutations. Afterward, the SVM model's model-agnostic explanation and the LR model's interpretation were compared to enhance advanced comprehension for further insights.
Results: The results of the 100 permutations demonstrated that the LR model (Average: accuracy = 0.9105, precision = NaN, sensitivity = 0, specificity = 1, F1-score = 0, area under the ROC curve (AUC) = 0.6780, run-time = 0.0832) outperformed the SVM model (Average: accuracy = 0.8470, precision = 0.1062, sensitivity = 0.0949, specificity = 0.9209, F1-score = 0.1000, AUC = 0.5632, run-time = 0.0254) in predicting infant mortality, but the LR model had a slower run-time and it was unable to predict any positive cases. The interpretation of LR analysis revealed that infant mortality rates decrease when mothers give birth after over two years, with higher educational attainment, overweight or obese mothers, working mothers, and families with polluted cooking fuel having lower rates. The local ICE interpretability technique, which depicts individual influences on the average likelihood of dying within the first birthday, explores the interpretable SVM model that mothers with normal BMIs, giving birth within two years, using less polluted cooking fuel, working mothers, and having male infant were more likely to experience infant death. The interpretable SVM model based on the global surrogate model also reveals that working mothers who used polluted cooking fuel at home and working women who used less polluted cooking fuel but had a longer period between pregnancies than two years would have higher infant death rates. Even among non-working mothers who used polluted cooking fuel and gave birth within two years of the preceding one, infant death rates were higher.
Conclusions: The interpretable SVM model reveals global interpretations help clinicians understand the entire conditional distribution, while local interpretations focus on specific instances, providing different insights into model behavior. Interpretable ML models aid policymakers, stakeholders, and families in understanding and preventing infant deaths by improving policy-making strategies and establishing effective family counseling services.
期刊介绍:
Journal of Health, Population and Nutrition brings together research on all aspects of issues related to population, nutrition and health. The journal publishes articles across a broad range of topics including global health, maternal and child health, nutrition, common illnesses and determinants of population health.