{"title":"利用机器学习预测撒哈拉以南非洲地区的乳房自我检查意识。","authors":"Nebebe Demis Baykemagn, Meron Asmamaw Alemayehu, Tirualem Zeleke Yehuala, Agmasie Damtew Walle, Andualem Enyew Gedefaw, Abraham Keffale Mengistu","doi":"10.1038/s41598-025-03112-6","DOIUrl":null,"url":null,"abstract":"<p><p>Breast self-examination is a very cost-reducing approach that significantly decreases the cost burdens associated with medical equipment, fees of healthcare practitioners, transportation to health facilities, and other indirect costs. Furthermore, it raises accessibility to health services and is significant in averting the transmission of infectious illnesses in low- and middle-income countries, constituting a sustainable channel for gains in public health. We employed a total weight of 133,425 from the Demographic and Health Survey using STATA Version 17, MS Excel 2016, and Python 3.10 for data management. Additionally, Min-Max scaling and standard scaling were used for variable scaling, along with Recursive Feature Elimination for feature selection. The data was split in an 80:20 ratio for training and testing, and balanced using Tomek Links combined with Random Over-Sampling. The model performance was evaluated by ROC-AUC, AUC, accuracy, F1 score, recall, and precision. The Decision Tree model was the best-performing one, with an accuracy of 82% and an AUC of 0.87. The reason for this superior performance is its capacity to accurately represent non-linear associations and interactions in the data, which were difficult for more conventional models like logistic regression to do. Woman's age, smartphone availability, marital status, health facility visits, HIV testing, number of children, examination by healthcare providers, wealth status, place of residence, mother's occupation, education level, social media use, health status, and distance to health facilities predictors of breast self-examination. In conclusion, Decision Tree is the top-performing model with an AUC and accuracy of 87% and 82%, respectively, due to its ability to capture non-linear relationships between predictors and the target variable, use ensemble averaging and random feature selection to reduce variance and overfitting, and its inherent feature importance mechanism that keeps it robust to irrelevant features. Based on this study finding, to increase awareness of breast self-examination (BSE), we recommend, Create awareness for community leaders about breast cancer and the benefits of self-examination, deploying mobile health clinics and outreach programs, Training health extension workers on proper BSE to share with the community, additionally, launching radio/television campaigns in local languages to disseminate information for large audience.</p>","PeriodicalId":21811,"journal":{"name":"Scientific Reports","volume":"15 1","pages":"19604"},"PeriodicalIF":3.9000,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12137617/pdf/","citationCount":"0","resultStr":"{\"title\":\"Predicting breast self-examination awareness in Sub-Saharan Africa using machine learning.\",\"authors\":\"Nebebe Demis Baykemagn, Meron Asmamaw Alemayehu, Tirualem Zeleke Yehuala, Agmasie Damtew Walle, Andualem Enyew Gedefaw, Abraham Keffale Mengistu\",\"doi\":\"10.1038/s41598-025-03112-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Breast self-examination is a very cost-reducing approach that significantly decreases the cost burdens associated with medical equipment, fees of healthcare practitioners, transportation to health facilities, and other indirect costs. Furthermore, it raises accessibility to health services and is significant in averting the transmission of infectious illnesses in low- and middle-income countries, constituting a sustainable channel for gains in public health. We employed a total weight of 133,425 from the Demographic and Health Survey using STATA Version 17, MS Excel 2016, and Python 3.10 for data management. Additionally, Min-Max scaling and standard scaling were used for variable scaling, along with Recursive Feature Elimination for feature selection. The data was split in an 80:20 ratio for training and testing, and balanced using Tomek Links combined with Random Over-Sampling. The model performance was evaluated by ROC-AUC, AUC, accuracy, F1 score, recall, and precision. The Decision Tree model was the best-performing one, with an accuracy of 82% and an AUC of 0.87. The reason for this superior performance is its capacity to accurately represent non-linear associations and interactions in the data, which were difficult for more conventional models like logistic regression to do. Woman's age, smartphone availability, marital status, health facility visits, HIV testing, number of children, examination by healthcare providers, wealth status, place of residence, mother's occupation, education level, social media use, health status, and distance to health facilities predictors of breast self-examination. In conclusion, Decision Tree is the top-performing model with an AUC and accuracy of 87% and 82%, respectively, due to its ability to capture non-linear relationships between predictors and the target variable, use ensemble averaging and random feature selection to reduce variance and overfitting, and its inherent feature importance mechanism that keeps it robust to irrelevant features. Based on this study finding, to increase awareness of breast self-examination (BSE), we recommend, Create awareness for community leaders about breast cancer and the benefits of self-examination, deploying mobile health clinics and outreach programs, Training health extension workers on proper BSE to share with the community, additionally, launching radio/television campaigns in local languages to disseminate information for large audience.</p>\",\"PeriodicalId\":21811,\"journal\":{\"name\":\"Scientific Reports\",\"volume\":\"15 1\",\"pages\":\"19604\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12137617/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific Reports\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1038/s41598-025-03112-6\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Reports","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41598-025-03112-6","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
摘要
乳房自我检查是一种非常降低成本的方法,可以显著降低与医疗设备、医疗从业人员费用、前往医疗机构的交通费和其他间接费用相关的成本负担。此外,它提高了获得保健服务的机会,在低收入和中等收入国家避免传染病传播方面具有重要意义,是公共卫生方面取得进展的可持续渠道。我们使用STATA Version 17、MS Excel 2016和Python 3.10进行数据管理,采用来自人口与健康调查的总权重为133,425。此外,变量缩放采用最小-最大缩放和标准缩放,特征选择采用递归特征消除。数据以80:20的比例分割用于训练和测试,并使用Tomek Links结合Random oversampling进行平衡。通过ROC-AUC、AUC、准确率、F1评分、召回率和准确率来评估模型的性能。决策树模型是表现最好的模型,准确率为82%,AUC为0.87。这种优越性能的原因是它能够准确地表示数据中的非线性关联和相互作用,这是更传统的模型(如逻辑回归)难以做到的。妇女的年龄、智能手机的可用性、婚姻状况、卫生设施访问、艾滋病毒检测、子女数量、卫生保健提供者的检查、财富状况、居住地、母亲的职业、教育水平、社交媒体使用、健康状况和距离卫生设施的距离是乳房自我检查的预测因素。总之,决策树是表现最好的模型,AUC和准确率分别为87%和82%,因为它能够捕捉预测因子与目标变量之间的非线性关系,使用集合平均和随机特征选择来减少方差和过拟合,以及其固有的特征重要性机制,使其对无关特征保持鲁棒性。基于这一研究发现,为了提高人们对乳房自我检查(BSE)的认识,我们建议:提高社区领导人对乳腺癌和自我检查益处的认识,部署流动健康诊所和外展计划,培训健康推广人员正确地与社区分享BSE,此外,用当地语言开展广播/电视宣传活动,向广大受众传播信息。
Predicting breast self-examination awareness in Sub-Saharan Africa using machine learning.
Breast self-examination is a very cost-reducing approach that significantly decreases the cost burdens associated with medical equipment, fees of healthcare practitioners, transportation to health facilities, and other indirect costs. Furthermore, it raises accessibility to health services and is significant in averting the transmission of infectious illnesses in low- and middle-income countries, constituting a sustainable channel for gains in public health. We employed a total weight of 133,425 from the Demographic and Health Survey using STATA Version 17, MS Excel 2016, and Python 3.10 for data management. Additionally, Min-Max scaling and standard scaling were used for variable scaling, along with Recursive Feature Elimination for feature selection. The data was split in an 80:20 ratio for training and testing, and balanced using Tomek Links combined with Random Over-Sampling. The model performance was evaluated by ROC-AUC, AUC, accuracy, F1 score, recall, and precision. The Decision Tree model was the best-performing one, with an accuracy of 82% and an AUC of 0.87. The reason for this superior performance is its capacity to accurately represent non-linear associations and interactions in the data, which were difficult for more conventional models like logistic regression to do. Woman's age, smartphone availability, marital status, health facility visits, HIV testing, number of children, examination by healthcare providers, wealth status, place of residence, mother's occupation, education level, social media use, health status, and distance to health facilities predictors of breast self-examination. In conclusion, Decision Tree is the top-performing model with an AUC and accuracy of 87% and 82%, respectively, due to its ability to capture non-linear relationships between predictors and the target variable, use ensemble averaging and random feature selection to reduce variance and overfitting, and its inherent feature importance mechanism that keeps it robust to irrelevant features. Based on this study finding, to increase awareness of breast self-examination (BSE), we recommend, Create awareness for community leaders about breast cancer and the benefits of self-examination, deploying mobile health clinics and outreach programs, Training health extension workers on proper BSE to share with the community, additionally, launching radio/television campaigns in local languages to disseminate information for large audience.
期刊介绍:
We publish original research from all areas of the natural sciences, psychology, medicine and engineering. You can learn more about what we publish by browsing our specific scientific subject areas below or explore Scientific Reports by browsing all articles and collections.
Scientific Reports has a 2-year impact factor: 4.380 (2021), and is the 6th most-cited journal in the world, with more than 540,000 citations in 2020 (Clarivate Analytics, 2021).
•Engineering
Engineering covers all aspects of engineering, technology, and applied science. It plays a crucial role in the development of technologies to address some of the world''s biggest challenges, helping to save lives and improve the way we live.
•Physical sciences
Physical sciences are those academic disciplines that aim to uncover the underlying laws of nature — often written in the language of mathematics. It is a collective term for areas of study including astronomy, chemistry, materials science and physics.
•Earth and environmental sciences
Earth and environmental sciences cover all aspects of Earth and planetary science and broadly encompass solid Earth processes, surface and atmospheric dynamics, Earth system history, climate and climate change, marine and freshwater systems, and ecology. It also considers the interactions between humans and these systems.
•Biological sciences
Biological sciences encompass all the divisions of natural sciences examining various aspects of vital processes. The concept includes anatomy, physiology, cell biology, biochemistry and biophysics, and covers all organisms from microorganisms, animals to plants.
•Health sciences
The health sciences study health, disease and healthcare. This field of study aims to develop knowledge, interventions and technology for use in healthcare to improve the treatment of patients.