{"title":"Cervical cancer screening uptake and its associated factor in Sub-Sharan Africa: a machine learning approach.","authors":"Fetlework Gubena Arage, Zinabu Bekele Tadese, Eliyas Addisu Taye, Tigist Kifle Tsegaw, Tsegasilassie Gebremariam Abate, Eyob Akalewold Alemu","doi":"10.1186/s12911-025-03039-y","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Cervical cancer, which includes squamous cell carcinoma and adenocarcinoma, is a leading cause of cancer-related deaths globally, particularly in low- and middle-income countries (LMICs). It is preventable through early screening, but incidence and mortality rates are significantly higher in LMICs, with 94% of deaths occurring in these regions. Poor implementation of screening programs, in addition to multiple health system barriers, leads to a high burden from cervical cancer in these countries. Projections show increasing cases and deaths due to the disease by 2030. Using machine learning instead of the usual statistical tests will incorporate the complex and non-linear relationship of factors in predicting the outcome variable.</p><p><strong>Method: </strong>The secondary data for ten Sub-Saharan African countries were utilized from the Demographic and Health Survey, DHS, to evaluate cervical cancer screening uptake among women aged 25-49 years. During cleaning missing values and outliers were removed. Class balancing by Synthetic minority oversampling techniques (SMOT) was done and tuning hyperparameters via grid search was used in the models before splitting into training and validation sets containing 89% and 20%, respectively. The following machine learning classification algorithms were used in the study: Logistic Regression, Decision Tree Classifier, Random Forest, K-Nearest Neighbor, Gradient Boosting, AdaBoost, and Extra Trees. These algorithms were employed to predict cervical cancer screening uptake. The performance of the models was evaluated using accuracy, precision, recall, and F1 score.</p><p><strong>Result: </strong>In this study, a cervical cancer screening uptake was predicted among 75,360 weighted samples of women from an African country, aged 25-49 with the final data for model formulation of 53,461, where the Extra Trees Classifier obtained an accuracy of 94.13%, a precision of 95.76%, recall of 94.12%, F1-score of 93.80%. Then followed Random Forest: accuracy = 93.87, precision = 99.18%. Health visits, proximity to health care, using contraceptives, residing in urban settings, and exposure to media were its most crucial predictors. The ensemble methods, such as Extra Trees and Random Forest, showed the best generalization, indicating that this work well on complex datasets and can help devise targeted intervention strategies.</p><p><strong>Conclusion: </strong>This study demonstrates that the ensemble machine learning models, such as Extra Trees Classifier and Random Forest, are promising in predicting cervical cancer screening uptake among African women with accuracies of 94.13% and 93.87%, respectively. Key predictors include healthcare access, sociocultural factors, media exposure, residence in urban areas, and contraceptive use. The findings emphasize the need for a reduction in care barriers and the use of family planning visits and mass media in promoting screening. These results will be validated in different populations in order to find the clinical integration via decision support systems.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"197"},"PeriodicalIF":3.3000,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12107765/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03039-y","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Cervical cancer, which includes squamous cell carcinoma and adenocarcinoma, is a leading cause of cancer-related deaths globally, particularly in low- and middle-income countries (LMICs). It is preventable through early screening, but incidence and mortality rates are significantly higher in LMICs, with 94% of deaths occurring in these regions. Poor implementation of screening programs, in addition to multiple health system barriers, leads to a high burden from cervical cancer in these countries. Projections show increasing cases and deaths due to the disease by 2030. Using machine learning instead of the usual statistical tests will incorporate the complex and non-linear relationship of factors in predicting the outcome variable.
Method: The secondary data for ten Sub-Saharan African countries were utilized from the Demographic and Health Survey, DHS, to evaluate cervical cancer screening uptake among women aged 25-49 years. During cleaning missing values and outliers were removed. Class balancing by Synthetic minority oversampling techniques (SMOT) was done and tuning hyperparameters via grid search was used in the models before splitting into training and validation sets containing 89% and 20%, respectively. The following machine learning classification algorithms were used in the study: Logistic Regression, Decision Tree Classifier, Random Forest, K-Nearest Neighbor, Gradient Boosting, AdaBoost, and Extra Trees. These algorithms were employed to predict cervical cancer screening uptake. The performance of the models was evaluated using accuracy, precision, recall, and F1 score.
Result: In this study, a cervical cancer screening uptake was predicted among 75,360 weighted samples of women from an African country, aged 25-49 with the final data for model formulation of 53,461, where the Extra Trees Classifier obtained an accuracy of 94.13%, a precision of 95.76%, recall of 94.12%, F1-score of 93.80%. Then followed Random Forest: accuracy = 93.87, precision = 99.18%. Health visits, proximity to health care, using contraceptives, residing in urban settings, and exposure to media were its most crucial predictors. The ensemble methods, such as Extra Trees and Random Forest, showed the best generalization, indicating that this work well on complex datasets and can help devise targeted intervention strategies.
Conclusion: This study demonstrates that the ensemble machine learning models, such as Extra Trees Classifier and Random Forest, are promising in predicting cervical cancer screening uptake among African women with accuracies of 94.13% and 93.87%, respectively. Key predictors include healthcare access, sociocultural factors, media exposure, residence in urban areas, and contraceptive use. The findings emphasize the need for a reduction in care barriers and the use of family planning visits and mass media in promoting screening. These results will be validated in different populations in order to find the clinical integration via decision support systems.
期刊介绍:
BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.