Shahid Mohammad Ganie, Pijush Kanti Dutta Pramanik, Zhongming Zhao
{"title":"Enhanced and Interpretable Prediction of Multiple Cancer Types Using a Stacking Ensemble Approach with SHAP Analysis.","authors":"Shahid Mohammad Ganie, Pijush Kanti Dutta Pramanik, Zhongming Zhao","doi":"10.3390/bioengineering12050472","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background</b>: Cancer is a leading cause of death worldwide, and its early detection is crucial for improving patient outcomes. This study aimed to develop and evaluate ensemble learning models, specifically stacking, for the accurate prediction of lung, breast, and cervical cancers using lifestyle and clinical data. <b>Methods</b>: 12 base learners were trained on datasets for lung, breast, and cervical cancer. Stacking ensemble models were then developed using these base learners. The models were evaluated for accuracy, precision, recall, F1-score, AUC-ROC, MCC, and kappa. An explainable AI technique, SHAP, was used to interpret model predictions. <b>Results</b>: The stacking ensemble model outperformed individual base learners across all three cancer types. On average, for three cancer datasets, it achieved 99.28% accuracy, 99.55% precision, 97.56% recall, and 98.49% F1-score. A similar high performance was observed in terms of AUC, Kappa, and MCC. The SHAP analysis revealed the most influential features for each cancer type, e.g., fatigue and alcohol consumption for lung cancer, worst concave points, mean concave points, and worst perimeter for breast cancer and Schiller test for cervical cancer. <b>Conclusions</b>: The stacking-based multi-cancer prediction model demonstrated superior accuracy and interpretability compared with traditional models. Combining diverse base learners with explainable AI offers predictive power and transparency in clinical applications. Key demographic and clinical features driving cancer risk were also identified. Further research should validate the model on more diverse populations and cancer types.</p>","PeriodicalId":8874,"journal":{"name":"Bioengineering","volume":"12 5","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12108849/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioengineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3390/bioengineering12050472","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Cancer is a leading cause of death worldwide, and its early detection is crucial for improving patient outcomes. This study aimed to develop and evaluate ensemble learning models, specifically stacking, for the accurate prediction of lung, breast, and cervical cancers using lifestyle and clinical data. Methods: 12 base learners were trained on datasets for lung, breast, and cervical cancer. Stacking ensemble models were then developed using these base learners. The models were evaluated for accuracy, precision, recall, F1-score, AUC-ROC, MCC, and kappa. An explainable AI technique, SHAP, was used to interpret model predictions. Results: The stacking ensemble model outperformed individual base learners across all three cancer types. On average, for three cancer datasets, it achieved 99.28% accuracy, 99.55% precision, 97.56% recall, and 98.49% F1-score. A similar high performance was observed in terms of AUC, Kappa, and MCC. The SHAP analysis revealed the most influential features for each cancer type, e.g., fatigue and alcohol consumption for lung cancer, worst concave points, mean concave points, and worst perimeter for breast cancer and Schiller test for cervical cancer. Conclusions: The stacking-based multi-cancer prediction model demonstrated superior accuracy and interpretability compared with traditional models. Combining diverse base learners with explainable AI offers predictive power and transparency in clinical applications. Key demographic and clinical features driving cancer risk were also identified. Further research should validate the model on more diverse populations and cancer types.
期刊介绍:
Aims
Bioengineering (ISSN 2306-5354) provides an advanced forum for the science and technology of bioengineering. It publishes original research papers, comprehensive reviews, communications and case reports. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. All aspects of bioengineering are welcomed from theoretical concepts to education and applications. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. There are, in addition, four key features of this Journal:
● We are introducing a new concept in scientific and technical publications “The Translational Case Report in Bioengineering”. It is a descriptive explanatory analysis of a transformative or translational event. Understanding that the goal of bioengineering scholarship is to advance towards a transformative or clinical solution to an identified transformative/clinical need, the translational case report is used to explore causation in order to find underlying principles that may guide other similar transformative/translational undertakings.
● Manuscripts regarding research proposals and research ideas will be particularly welcomed.
● Electronic files and software regarding the full details of the calculation and experimental procedure, if unable to be published in a normal way, can be deposited as supplementary material.
● We also accept manuscripts communicating to a broader audience with regard to research projects financed with public funds.
Scope
● Bionics and biological cybernetics: implantology; bio–abio interfaces
● Bioelectronics: wearable electronics; implantable electronics; “more than Moore” electronics; bioelectronics devices
● Bioprocess and biosystems engineering and applications: bioprocess design; biocatalysis; bioseparation and bioreactors; bioinformatics; bioenergy; etc.
● Biomolecular, cellular and tissue engineering and applications: tissue engineering; chromosome engineering; embryo engineering; cellular, molecular and synthetic biology; metabolic engineering; bio-nanotechnology; micro/nano technologies; genetic engineering; transgenic technology
● Biomedical engineering and applications: biomechatronics; biomedical electronics; biomechanics; biomaterials; biomimetics; biomedical diagnostics; biomedical therapy; biomedical devices; sensors and circuits; biomedical imaging and medical information systems; implants and regenerative medicine; neurotechnology; clinical engineering; rehabilitation engineering
● Biochemical engineering and applications: metabolic pathway engineering; modeling and simulation
● Translational bioengineering