Rajib Kumar Halder, Marzana Akter Lima, Mohammed Nasir Uddin, Md.Aminul Islam, Adri Saha
{"title":"基于优化超参数的基于特征选择的叠加集成模型与智能web应用预测乳腺癌","authors":"Rajib Kumar Halder, Marzana Akter Lima, Mohammed Nasir Uddin, Md.Aminul Islam, Adri Saha","doi":"10.1016/j.ceh.2025.08.001","DOIUrl":null,"url":null,"abstract":"<div><div>Breast cancer is a leading cause of morbidity and mortality among women worldwide, arising from malignant cell transformations in breast tissue. Early detection is paramount as it significantly improves survival rates and reduces the complexity and cost of treatment. Machine learning has revolutionized this field, providing more precise, efficient, and personalized diagnostic methods. Our research aims to develop a robust predictive model for breast cancer classification through rigorous preprocessing, diverse feature selection techniques, and advanced ensemble learning strategies. A central component of our methodology is the employment of a Stacking Classifier integrated with multiple base classifiers, optimized using RandomizedSearchCV to fine-tune hyperparameters. This process enhances the model’s accuracy, reliability, and generalizability. Significantly, our feature selection process involves three methodologies: filter, wrapper, and embedded methods. By applying these techniques, we identify the most critical features that are consistently selected across all methods. These features are then used to train the model, ensuring that our approach focuses on the most relevant data points for breast cancer classification. Utilizing the Wisconsin Breast Cancer Dataset from the UCI repository, which comprises 569 patient records, our model demonstrates exceptional performance. It achieves a perfect accuracy of 100% and an AUC-ROC of 1.00, indicating flawless sensitivity and specificity. The proposed framework was evaluated using two distinct datasets: the Wisconsin Prognostic Breast Cancer (WPBC) dataset and the Wisconsin Original Breast Cancer (WOBC) dataset. This model stands out for its potential to significantly enhance early detection and treatment strategies, marking a significant advance in applying machine learning to improve healthcare outcomes. Additionally, we have developed a user-friendly web app for breast cancer detection using our predictive model.</div></div>","PeriodicalId":100268,"journal":{"name":"Clinical eHealth","volume":"8 ","pages":"Pages 146-161"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrated feature selection-based stacking ensemble model using optimized hyperparameters to predict breast cancer with smart web application\",\"authors\":\"Rajib Kumar Halder, Marzana Akter Lima, Mohammed Nasir Uddin, Md.Aminul Islam, Adri Saha\",\"doi\":\"10.1016/j.ceh.2025.08.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Breast cancer is a leading cause of morbidity and mortality among women worldwide, arising from malignant cell transformations in breast tissue. Early detection is paramount as it significantly improves survival rates and reduces the complexity and cost of treatment. Machine learning has revolutionized this field, providing more precise, efficient, and personalized diagnostic methods. Our research aims to develop a robust predictive model for breast cancer classification through rigorous preprocessing, diverse feature selection techniques, and advanced ensemble learning strategies. A central component of our methodology is the employment of a Stacking Classifier integrated with multiple base classifiers, optimized using RandomizedSearchCV to fine-tune hyperparameters. This process enhances the model’s accuracy, reliability, and generalizability. Significantly, our feature selection process involves three methodologies: filter, wrapper, and embedded methods. By applying these techniques, we identify the most critical features that are consistently selected across all methods. These features are then used to train the model, ensuring that our approach focuses on the most relevant data points for breast cancer classification. Utilizing the Wisconsin Breast Cancer Dataset from the UCI repository, which comprises 569 patient records, our model demonstrates exceptional performance. It achieves a perfect accuracy of 100% and an AUC-ROC of 1.00, indicating flawless sensitivity and specificity. The proposed framework was evaluated using two distinct datasets: the Wisconsin Prognostic Breast Cancer (WPBC) dataset and the Wisconsin Original Breast Cancer (WOBC) dataset. This model stands out for its potential to significantly enhance early detection and treatment strategies, marking a significant advance in applying machine learning to improve healthcare outcomes. Additionally, we have developed a user-friendly web app for breast cancer detection using our predictive model.</div></div>\",\"PeriodicalId\":100268,\"journal\":{\"name\":\"Clinical eHealth\",\"volume\":\"8 \",\"pages\":\"Pages 146-161\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical eHealth\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2588914125000206\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical eHealth","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2588914125000206","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Integrated feature selection-based stacking ensemble model using optimized hyperparameters to predict breast cancer with smart web application
Breast cancer is a leading cause of morbidity and mortality among women worldwide, arising from malignant cell transformations in breast tissue. Early detection is paramount as it significantly improves survival rates and reduces the complexity and cost of treatment. Machine learning has revolutionized this field, providing more precise, efficient, and personalized diagnostic methods. Our research aims to develop a robust predictive model for breast cancer classification through rigorous preprocessing, diverse feature selection techniques, and advanced ensemble learning strategies. A central component of our methodology is the employment of a Stacking Classifier integrated with multiple base classifiers, optimized using RandomizedSearchCV to fine-tune hyperparameters. This process enhances the model’s accuracy, reliability, and generalizability. Significantly, our feature selection process involves three methodologies: filter, wrapper, and embedded methods. By applying these techniques, we identify the most critical features that are consistently selected across all methods. These features are then used to train the model, ensuring that our approach focuses on the most relevant data points for breast cancer classification. Utilizing the Wisconsin Breast Cancer Dataset from the UCI repository, which comprises 569 patient records, our model demonstrates exceptional performance. It achieves a perfect accuracy of 100% and an AUC-ROC of 1.00, indicating flawless sensitivity and specificity. The proposed framework was evaluated using two distinct datasets: the Wisconsin Prognostic Breast Cancer (WPBC) dataset and the Wisconsin Original Breast Cancer (WOBC) dataset. This model stands out for its potential to significantly enhance early detection and treatment strategies, marking a significant advance in applying machine learning to improve healthcare outcomes. Additionally, we have developed a user-friendly web app for breast cancer detection using our predictive model.