Cervical cancer screening uptake and its associated factor in Sub-Sharan Africa: a machine learning approach.

IF 3.3 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making Pub Date : 2025-05-26 DOI:10.1186/s12911-025-03039-y

Fetlework Gubena Arage, Zinabu Bekele Tadese, Eliyas Addisu Taye, Tigist Kifle Tsegaw, Tsegasilassie Gebremariam Abate, Eyob Akalewold Alemu

{"title":"Cervical cancer screening uptake and its associated factor in Sub-Sharan Africa: a machine learning approach.","authors":"Fetlework Gubena Arage, Zinabu Bekele Tadese, Eliyas Addisu Taye, Tigist Kifle Tsegaw, Tsegasilassie Gebremariam Abate, Eyob Akalewold Alemu","doi":"10.1186/s12911-025-03039-y","DOIUrl":null,"url":null,"abstract":"Introduction: Cervical cancer, which includes squamous cell carcinoma and adenocarcinoma, is a leading cause of cancer-related deaths globally, particularly in low- and middle-income countries (LMICs). It is preventable through early screening, but incidence and mortality rates are significantly higher in LMICs, with 94% of deaths occurring in these regions. Poor implementation of screening programs, in addition to multiple health system barriers, leads to a high burden from cervical cancer in these countries. Projections show increasing cases and deaths due to the disease by 2030. Using machine learning instead of the usual statistical tests will incorporate the complex and non-linear relationship of factors in predicting the outcome variable.Method: The secondary data for ten Sub-Saharan African countries were utilized from the Demographic and Health Survey, DHS, to evaluate cervical cancer screening uptake among women aged 25-49 years. During cleaning missing values and outliers were removed. Class balancing by Synthetic minority oversampling techniques (SMOT) was done and tuning hyperparameters via grid search was used in the models before splitting into training and validation sets containing 89% and 20%, respectively. The following machine learning classification algorithms were used in the study: Logistic Regression, Decision Tree Classifier, Random Forest, K-Nearest Neighbor, Gradient Boosting, AdaBoost, and Extra Trees. These algorithms were employed to predict cervical cancer screening uptake. The performance of the models was evaluated using accuracy, precision, recall, and F1 score.Result: In this study, a cervical cancer screening uptake was predicted among 75,360 weighted samples of women from an African country, aged 25-49 with the final data for model formulation of 53,461, where the Extra Trees Classifier obtained an accuracy of 94.13%, a precision of 95.76%, recall of 94.12%, F1-score of 93.80%. Then followed Random Forest: accuracy = 93.87, precision = 99.18%. Health visits, proximity to health care, using contraceptives, residing in urban settings, and exposure to media were its most crucial predictors. The ensemble methods, such as Extra Trees and Random Forest, showed the best generalization, indicating that this work well on complex datasets and can help devise targeted intervention strategies.Conclusion: This study demonstrates that the ensemble machine learning models, such as Extra Trees Classifier and Random Forest, are promising in predicting cervical cancer screening uptake among African women with accuracies of 94.13% and 93.87%, respectively. Key predictors include healthcare access, sociocultural factors, media exposure, residence in urban areas, and contraceptive use. The findings emphasize the need for a reduction in care barriers and the use of family planning visits and mass media in promoting screening. These results will be validated in different populations in order to find the clinical integration via decision support systems.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"197"},"PeriodicalIF":3.3000,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12107765/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03039-y","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Cervical cancer, which includes squamous cell carcinoma and adenocarcinoma, is a leading cause of cancer-related deaths globally, particularly in low- and middle-income countries (LMICs). It is preventable through early screening, but incidence and mortality rates are significantly higher in LMICs, with 94% of deaths occurring in these regions. Poor implementation of screening programs, in addition to multiple health system barriers, leads to a high burden from cervical cancer in these countries. Projections show increasing cases and deaths due to the disease by 2030. Using machine learning instead of the usual statistical tests will incorporate the complex and non-linear relationship of factors in predicting the outcome variable.

Method: The secondary data for ten Sub-Saharan African countries were utilized from the Demographic and Health Survey, DHS, to evaluate cervical cancer screening uptake among women aged 25-49 years. During cleaning missing values and outliers were removed. Class balancing by Synthetic minority oversampling techniques (SMOT) was done and tuning hyperparameters via grid search was used in the models before splitting into training and validation sets containing 89% and 20%, respectively. The following machine learning classification algorithms were used in the study: Logistic Regression, Decision Tree Classifier, Random Forest, K-Nearest Neighbor, Gradient Boosting, AdaBoost, and Extra Trees. These algorithms were employed to predict cervical cancer screening uptake. The performance of the models was evaluated using accuracy, precision, recall, and F1 score.

Result: In this study, a cervical cancer screening uptake was predicted among 75,360 weighted samples of women from an African country, aged 25-49 with the final data for model formulation of 53,461, where the Extra Trees Classifier obtained an accuracy of 94.13%, a precision of 95.76%, recall of 94.12%, F1-score of 93.80%. Then followed Random Forest: accuracy = 93.87, precision = 99.18%. Health visits, proximity to health care, using contraceptives, residing in urban settings, and exposure to media were its most crucial predictors. The ensemble methods, such as Extra Trees and Random Forest, showed the best generalization, indicating that this work well on complex datasets and can help devise targeted intervention strategies.

Conclusion: This study demonstrates that the ensemble machine learning models, such as Extra Trees Classifier and Random Forest, are promising in predicting cervical cancer screening uptake among African women with accuracies of 94.13% and 93.87%, respectively. Key predictors include healthcare access, sociocultural factors, media exposure, residence in urban areas, and contraceptive use. The findings emphasize the need for a reduction in care barriers and the use of family planning visits and mass media in promoting screening. These results will be validated in different populations in order to find the clinical integration via decision support systems.

查看原文本刊更多论文

撒哈拉以南非洲的宫颈癌筛查及其相关因素：一种机器学习方法。

引言：宫颈癌，包括鳞状细胞癌和腺癌，是全球癌症相关死亡的主要原因，特别是在低收入和中等收入国家。它可以通过早期筛查来预防，但中低收入国家的发病率和死亡率要高得多，94%的死亡发生在这些地区。筛查规划实施不佳，加上多重卫生系统障碍，导致这些国家的宫颈癌负担很高。预测显示，到2030年，该病的病例和死亡人数将不断增加。使用机器学习代替通常的统计测试将在预测结果变量时纳入复杂的非线性因素关系。方法：利用来自人口与健康调查（DHS）的10个撒哈拉以南非洲国家的二手数据，评估25-49岁妇女宫颈癌筛查的接受情况。在清洗过程中，缺失值和异常值被去除。通过合成少数派过采样技术（SMOT）进行类平衡，并通过网格搜索对模型进行超参数调优，然后将模型划分为分别包含89%和20%的训练集和验证集。研究中使用了以下机器学习分类算法：逻辑回归、决策树分类器、随机森林、k近邻、梯度增强、AdaBoost和Extra Trees。这些算法被用来预测宫颈癌筛查的摄取。使用准确性、精密度、召回率和F1分数来评估模型的性能。结果：本研究预测了非洲某国家25-49岁女性的75,360个加权样本的宫颈癌筛查吸收情况，模型构建的最终数据为53,461，其中Extra Trees分类器的准确率为94.13%，精密度为95.76%，召回率为94.12%，f1评分为93.80%。然后是Random Forest：准确率= 93.87，精密度= 99.18%。健康访问、接近卫生保健、使用避孕药具、居住在城市环境和接触媒体是其最重要的预测因素。集成方法，如额外树和随机森林，显示出最好的泛化，表明这种方法在复杂的数据集上工作得很好，可以帮助设计有针对性的干预策略。结论：本研究表明，集成机器学习模型，如Extra Trees Classifier和Random Forest，在预测非洲妇女宫颈癌筛查率方面有希望，准确率分别为94.13%和93.87%。主要预测因素包括医疗保健获取、社会文化因素、媒体接触、居住在城市地区和避孕措施的使用。调查结果强调需要减少护理障碍，并利用计划生育访问和大众传播媒介促进筛查。这些结果将在不同的人群中进行验证，以便通过决策支持系统找到临床整合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Informatics and Decision Making 医学-医学：信息

CiteScore

7.20

自引率

5.70%

发文量

297

审稿时长

1 months

期刊介绍： BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.