{"title":"Identifying Predictors of Cervical Cancer Screening Uptake in Sub-Saharan Africa Using Machine Learning: Cross-Sectional Study.","authors":"Nebebe Demis Baykemagn, Mekuriaw Nibret Aweke, Amare Mesfin, Lemlem Daniel Baffa, Muluken Chanie Agimas, Habtamu Wagnew Abuhay, Dagnew Getnet Adugna, Tewodros Getaneh Alemu, Alemu Teshale Bicha, Gebrie Getu Alemu","doi":"10.2196/71677","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Cervical cancer has been ranked as the fourth most common cancer affecting women, contributing to approximately 660,000 new diagnoses and 350,000 fatalities worldwide. Effective early screening has been shown to reduce cervical cancer incidence by up to 80% and prevent more than 40% of new cases.</p><p><strong>Objective: </strong>This study aims to assess a machine learning-based prediction model and identify the key predictors influencing cervical cancer screening uptake among women aged 30-49 years in sub-Saharan Africa.</p><p><strong>Methods: </strong>For this study, a weighted dataset of 33,952 individuals from the 2022 Demographic and Health Survey in Ghana, Kenya, Mozambique, and Tanzania was used. STATA version 17 (StataCorp) and Python 3.10 (Python Software Foundation) were used for data preprocessing and analysis. MinMax and standard scaler were applied for feature scaling, and recursive feature elimination was used for feature selection. An 80:20 ratio was applied for data splitting. Tomek links with random oversampling were used for handling class imbalance. A total of 7 models were selected and trained using both balanced and unbalanced datasets. Model evaluation was performed using area under the receiver operating characteristic curve, accuracy, and a confusion matrix.</p><p><strong>Results: </strong>The proportion of cervical cancer screening in sub-Saharan Africa was 13%, which is lower than reported in previous studies. Random forest was the best-performing model, achieving an accuracy of 78%, an area under the curve of 86%, an F1-score of 79%, a recall of 81%, and a precision of 77%. The waterfall plot's Shapley Additive Explanations analysis showed that wealth status, awareness of sexually transmitted infections, HIV testing exposure, age at first sexual intercourse, educational level, residency, smartphone ownership, having a single sexual partner, and previous health status were predictors of cervical cancer screening.</p><p><strong>Conclusions: </strong>Improving education and awareness, expanding access to screening (especially in rural areas), leveraging both digital health and community-based outreach, integrating screening with other health services, and addressing socioeconomic barriers are recommended strategies to increase cervical cancer screening rates in sub-Saharan Africa.</p>","PeriodicalId":14765,"journal":{"name":"JMIR Public Health and Surveillance","volume":"11 ","pages":"e71677"},"PeriodicalIF":3.9000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12443358/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Public Health and Surveillance","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/71677","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Cervical cancer has been ranked as the fourth most common cancer affecting women, contributing to approximately 660,000 new diagnoses and 350,000 fatalities worldwide. Effective early screening has been shown to reduce cervical cancer incidence by up to 80% and prevent more than 40% of new cases.
Objective: This study aims to assess a machine learning-based prediction model and identify the key predictors influencing cervical cancer screening uptake among women aged 30-49 years in sub-Saharan Africa.
Methods: For this study, a weighted dataset of 33,952 individuals from the 2022 Demographic and Health Survey in Ghana, Kenya, Mozambique, and Tanzania was used. STATA version 17 (StataCorp) and Python 3.10 (Python Software Foundation) were used for data preprocessing and analysis. MinMax and standard scaler were applied for feature scaling, and recursive feature elimination was used for feature selection. An 80:20 ratio was applied for data splitting. Tomek links with random oversampling were used for handling class imbalance. A total of 7 models were selected and trained using both balanced and unbalanced datasets. Model evaluation was performed using area under the receiver operating characteristic curve, accuracy, and a confusion matrix.
Results: The proportion of cervical cancer screening in sub-Saharan Africa was 13%, which is lower than reported in previous studies. Random forest was the best-performing model, achieving an accuracy of 78%, an area under the curve of 86%, an F1-score of 79%, a recall of 81%, and a precision of 77%. The waterfall plot's Shapley Additive Explanations analysis showed that wealth status, awareness of sexually transmitted infections, HIV testing exposure, age at first sexual intercourse, educational level, residency, smartphone ownership, having a single sexual partner, and previous health status were predictors of cervical cancer screening.
Conclusions: Improving education and awareness, expanding access to screening (especially in rural areas), leveraging both digital health and community-based outreach, integrating screening with other health services, and addressing socioeconomic barriers are recommended strategies to increase cervical cancer screening rates in sub-Saharan Africa.
期刊介绍:
JMIR Public Health & Surveillance (JPHS) is a renowned scholarly journal indexed on PubMed. It follows a rigorous peer-review process and covers a wide range of disciplines. The journal distinguishes itself by its unique focus on the intersection of technology and innovation in the field of public health. JPHS delves into diverse topics such as public health informatics, surveillance systems, rapid reports, participatory epidemiology, infodemiology, infoveillance, digital disease detection, digital epidemiology, electronic public health interventions, mass media and social media campaigns, health communication, and emerging population health analysis systems and tools.