Enhanced and Interpretable Prediction of Multiple Cancer Types Using a Stacking Ensemble Approach with SHAP Analysis.

IF 3.8 3区医学 Q2 ENGINEERING, BIOMEDICAL

Bioengineering Pub Date : 2025-04-29 DOI:10.3390/bioengineering12050472

Shahid Mohammad Ganie, Pijush Kanti Dutta Pramanik, Zhongming Zhao

{"title":"Enhanced and Interpretable Prediction of Multiple Cancer Types Using a Stacking Ensemble Approach with SHAP Analysis.","authors":"Shahid Mohammad Ganie, Pijush Kanti Dutta Pramanik, Zhongming Zhao","doi":"10.3390/bioengineering12050472","DOIUrl":null,"url":null,"abstract":"Background: Cancer is a leading cause of death worldwide, and its early detection is crucial for improving patient outcomes. This study aimed to develop and evaluate ensemble learning models, specifically stacking, for the accurate prediction of lung, breast, and cervical cancers using lifestyle and clinical data. Methods: 12 base learners were trained on datasets for lung, breast, and cervical cancer. Stacking ensemble models were then developed using these base learners. The models were evaluated for accuracy, precision, recall, F1-score, AUC-ROC, MCC, and kappa. An explainable AI technique, SHAP, was used to interpret model predictions. Results: The stacking ensemble model outperformed individual base learners across all three cancer types. On average, for three cancer datasets, it achieved 99.28% accuracy, 99.55% precision, 97.56% recall, and 98.49% F1-score. A similar high performance was observed in terms of AUC, Kappa, and MCC. The SHAP analysis revealed the most influential features for each cancer type, e.g., fatigue and alcohol consumption for lung cancer, worst concave points, mean concave points, and worst perimeter for breast cancer and Schiller test for cervical cancer. Conclusions: The stacking-based multi-cancer prediction model demonstrated superior accuracy and interpretability compared with traditional models. Combining diverse base learners with explainable AI offers predictive power and transparency in clinical applications. Key demographic and clinical features driving cancer risk were also identified. Further research should validate the model on more diverse populations and cancer types.","PeriodicalId":8874,"journal":{"name":"Bioengineering","volume":"12 5","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12108849/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioengineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3390/bioengineering12050472","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Cancer is a leading cause of death worldwide, and its early detection is crucial for improving patient outcomes. This study aimed to develop and evaluate ensemble learning models, specifically stacking, for the accurate prediction of lung, breast, and cervical cancers using lifestyle and clinical data. Methods: 12 base learners were trained on datasets for lung, breast, and cervical cancer. Stacking ensemble models were then developed using these base learners. The models were evaluated for accuracy, precision, recall, F1-score, AUC-ROC, MCC, and kappa. An explainable AI technique, SHAP, was used to interpret model predictions. Results: The stacking ensemble model outperformed individual base learners across all three cancer types. On average, for three cancer datasets, it achieved 99.28% accuracy, 99.55% precision, 97.56% recall, and 98.49% F1-score. A similar high performance was observed in terms of AUC, Kappa, and MCC. The SHAP analysis revealed the most influential features for each cancer type, e.g., fatigue and alcohol consumption for lung cancer, worst concave points, mean concave points, and worst perimeter for breast cancer and Schiller test for cervical cancer. Conclusions: The stacking-based multi-cancer prediction model demonstrated superior accuracy and interpretability compared with traditional models. Combining diverse base learners with explainable AI offers predictive power and transparency in clinical applications. Key demographic and clinical features driving cancer risk were also identified. Further research should validate the model on more diverse populations and cancer types.

查看原文本刊更多论文

基于SHAP分析的叠加集成方法对多种癌症类型的增强和可解释预测。

背景：癌症是世界范围内死亡的主要原因，其早期发现对于改善患者预后至关重要。本研究旨在开发和评估集成学习模型，特别是堆叠模型，用于使用生活方式和临床数据准确预测肺癌、乳腺癌和宫颈癌。方法：在肺癌、乳腺癌和宫颈癌的数据集上训练12个基础学习器。然后利用这些基础学习器开发了堆叠集成模型。评估模型的准确性、精密度、召回率、f1评分、AUC-ROC、MCC和kappa。一种可解释的人工智能技术，SHAP，被用来解释模型预测。结果：堆叠集成模型在所有三种癌症类型中都优于个体基础学习器。对于三个癌症数据集，平均准确率达到99.28%，精密度达到99.55%，召回率达到97.56%，f1得分达到98.49%。在AUC、Kappa和MCC方面也观察到类似的高性能。SHAP分析揭示了每种癌症类型的最具影响力的特征，例如肺癌的疲劳和酒精消耗，乳腺癌的最差凹点，平均凹点和最差周长，宫颈癌的席勒试验。结论：与传统模型相比，基于堆叠的多癌预测模型具有更高的准确性和可解释性。将各种基础学习器与可解释的人工智能相结合，为临床应用提供了预测能力和透明度。还确定了驱动癌症风险的关键人口统计学和临床特征。进一步的研究应该在更多不同的人群和癌症类型上验证该模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Bioengineering Chemical Engineering-Bioengineering

CiteScore

4.00

自引率

8.70%

发文量

661

期刊介绍： Aims Bioengineering (ISSN 2306-5354) provides an advanced forum for the science and technology of bioengineering. It publishes original research papers, comprehensive reviews, communications and case reports. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. All aspects of bioengineering are welcomed from theoretical concepts to education and applications. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. There are, in addition, four key features of this Journal: ● We are introducing a new concept in scientific and technical publications “The Translational Case Report in Bioengineering”. It is a descriptive explanatory analysis of a transformative or translational event. Understanding that the goal of bioengineering scholarship is to advance towards a transformative or clinical solution to an identified transformative/clinical need, the translational case report is used to explore causation in order to find underlying principles that may guide other similar transformative/translational undertakings. ● Manuscripts regarding research proposals and research ideas will be particularly welcomed. ● Electronic files and software regarding the full details of the calculation and experimental procedure, if unable to be published in a normal way, can be deposited as supplementary material. ● We also accept manuscripts communicating to a broader audience with regard to research projects financed with public funds. Scope ● Bionics and biological cybernetics: implantology; bio–abio interfaces ● Bioelectronics: wearable electronics; implantable electronics; “more than Moore” electronics; bioelectronics devices ● Bioprocess and biosystems engineering and applications: bioprocess design; biocatalysis; bioseparation and bioreactors; bioinformatics; bioenergy; etc. ● Biomolecular, cellular and tissue engineering and applications: tissue engineering; chromosome engineering; embryo engineering; cellular, molecular and synthetic biology; metabolic engineering; bio-nanotechnology; micro/nano technologies; genetic engineering; transgenic technology ● Biomedical engineering and applications: biomechatronics; biomedical electronics; biomechanics; biomaterials; biomimetics; biomedical diagnostics; biomedical therapy; biomedical devices; sensors and circuits; biomedical imaging and medical information systems; implants and regenerative medicine; neurotechnology; clinical engineering; rehabilitation engineering ● Biochemical engineering and applications: metabolic pathway engineering; modeling and simulation ● Translational bioengineering