Integrated feature selection-based stacking ensemble model using optimized hyperparameters to predict breast cancer with smart web application

Rajib Kumar Halder, Marzana Akter Lima, Mohammed Nasir Uddin, Md.Aminul Islam, Adri Saha
{"title":"Integrated feature selection-based stacking ensemble model using optimized hyperparameters to predict breast cancer with smart web application","authors":"Rajib Kumar Halder,&nbsp;Marzana Akter Lima,&nbsp;Mohammed Nasir Uddin,&nbsp;Md.Aminul Islam,&nbsp;Adri Saha","doi":"10.1016/j.ceh.2025.08.001","DOIUrl":null,"url":null,"abstract":"<div><div>Breast cancer is a leading cause of morbidity and mortality among women worldwide, arising from malignant cell transformations in breast tissue. Early detection is paramount as it significantly improves survival rates and reduces the complexity and cost of treatment. Machine learning has revolutionized this field, providing more precise, efficient, and personalized diagnostic methods. Our research aims to develop a robust predictive model for breast cancer classification through rigorous preprocessing, diverse feature selection techniques, and advanced ensemble learning strategies. A central component of our methodology is the employment of a Stacking Classifier integrated with multiple base classifiers, optimized using RandomizedSearchCV to fine-tune hyperparameters. This process enhances the model’s accuracy, reliability, and generalizability. Significantly, our feature selection process involves three methodologies: filter, wrapper, and embedded methods. By applying these techniques, we identify the most critical features that are consistently selected across all methods. These features are then used to train the model, ensuring that our approach focuses on the most relevant data points for breast cancer classification. Utilizing the Wisconsin Breast Cancer Dataset from the UCI repository, which comprises 569 patient records, our model demonstrates exceptional performance. It achieves a perfect accuracy of 100% and an AUC-ROC of 1.00, indicating flawless sensitivity and specificity. The proposed framework was evaluated using two distinct datasets: the Wisconsin Prognostic Breast Cancer (WPBC) dataset and the Wisconsin Original Breast Cancer (WOBC) dataset. This model stands out for its potential to significantly enhance early detection and treatment strategies, marking a significant advance in applying machine learning to improve healthcare outcomes. Additionally, we have developed a user-friendly web app for breast cancer detection using our predictive model.</div></div>","PeriodicalId":100268,"journal":{"name":"Clinical eHealth","volume":"8 ","pages":"Pages 146-161"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical eHealth","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2588914125000206","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Breast cancer is a leading cause of morbidity and mortality among women worldwide, arising from malignant cell transformations in breast tissue. Early detection is paramount as it significantly improves survival rates and reduces the complexity and cost of treatment. Machine learning has revolutionized this field, providing more precise, efficient, and personalized diagnostic methods. Our research aims to develop a robust predictive model for breast cancer classification through rigorous preprocessing, diverse feature selection techniques, and advanced ensemble learning strategies. A central component of our methodology is the employment of a Stacking Classifier integrated with multiple base classifiers, optimized using RandomizedSearchCV to fine-tune hyperparameters. This process enhances the model’s accuracy, reliability, and generalizability. Significantly, our feature selection process involves three methodologies: filter, wrapper, and embedded methods. By applying these techniques, we identify the most critical features that are consistently selected across all methods. These features are then used to train the model, ensuring that our approach focuses on the most relevant data points for breast cancer classification. Utilizing the Wisconsin Breast Cancer Dataset from the UCI repository, which comprises 569 patient records, our model demonstrates exceptional performance. It achieves a perfect accuracy of 100% and an AUC-ROC of 1.00, indicating flawless sensitivity and specificity. The proposed framework was evaluated using two distinct datasets: the Wisconsin Prognostic Breast Cancer (WPBC) dataset and the Wisconsin Original Breast Cancer (WOBC) dataset. This model stands out for its potential to significantly enhance early detection and treatment strategies, marking a significant advance in applying machine learning to improve healthcare outcomes. Additionally, we have developed a user-friendly web app for breast cancer detection using our predictive model.

Abstract Image

基于优化超参数的基于特征选择的叠加集成模型与智能web应用预测乳腺癌
乳腺癌是全世界妇女发病和死亡的主要原因,由乳腺组织中的恶性细胞转化引起。早期发现是至关重要的,因为它可以显著提高生存率,降低治疗的复杂性和成本。机器学习彻底改变了这一领域,提供了更精确、高效和个性化的诊断方法。我们的研究旨在通过严格的预处理、多样化的特征选择技术和先进的集成学习策略,建立一个强大的乳腺癌分类预测模型。我们方法的一个核心组成部分是使用与多个基本分类器集成的堆叠分类器,使用RandomizedSearchCV进行优化以微调超参数。这一过程提高了模型的准确性、可靠性和通用性。值得注意的是,我们的特征选择过程涉及三种方法:过滤器、包装器和嵌入方法。通过应用这些技术,我们确定了在所有方法中一致选择的最关键的特征。然后使用这些特征来训练模型,确保我们的方法专注于与乳腺癌分类最相关的数据点。利用UCI存储库中的威斯康星乳腺癌数据集,其中包括569例患者记录,我们的模型展示了卓越的性能。它达到100%的完美准确度和1.00的AUC-ROC,表明完美的灵敏度和特异性。该框架使用两个不同的数据集进行评估:威斯康星州预后乳腺癌(WPBC)数据集和威斯康星州原始乳腺癌(WOBC)数据集。该模型因其显著增强早期检测和治疗策略的潜力而脱颖而出,标志着应用机器学习改善医疗保健结果的重大进步。此外,我们还开发了一个用户友好的web应用程序,用于使用我们的预测模型进行乳腺癌检测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.10
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信