Advanced explainable AI-driven biomarker identification for early breast cancer detection using peripheral blood mononuclear cells: Insights into prognostic biomarkers
{"title":"Advanced explainable AI-driven biomarker identification for early breast cancer detection using peripheral blood mononuclear cells: Insights into prognostic biomarkers","authors":"Azam jafarabadi , Elahe Sadat Abdolkarimi","doi":"10.1016/j.bspc.2025.107910","DOIUrl":null,"url":null,"abstract":"<div><div>Breast cancer is one of the leading causes of death worldwide. Despite advances in treatment, its increasing prevalence is a serious concern. Peripheral blood mononuclear cells (PBMCs) undergo gene expression changes when interacting with tumors and can be considered as promising biomarkers for early detection. This study aimed to identify potential biomarkers for breast cancer using explainable artificial intelligence (XAI) and machine learning models. Two datasets, GSE27562 and GSE47862, included healthy individuals and breast cancer patients. After careful preprocessing and data fusion, several machine learning models, including AdaBoost, XGBoost, Random Forest, and Decision Tree, were tested. The AdaBoost model achieved the highest accuracy of 98%. Using SHAP values, ten key genes that had the greatest impact on the model prediction were identified: MRPL3, SLC36A4, COMT, HAAO, KCTD10, FCHO1, RND2, RBM7, LBX1, and LTB4R. Pathway and functional analysis showed that these genes are involved in important processes such as protein metabolism and signal transduction and have high potential as biomarkers. Survival analysis was used to investigate the role of these genes in breast cancer prognosis, and Protein–Protein Interaction (PPI) analysis provided insights into the relationship and gene interaction networks. The findings of this study emphasize the high importance of PBMCs as a non-invasive tool for breast cancer prognosis and indicate that, given the high accuracy, interpretability, and potential of this method in clinical application, it can be used to transform cancer prognosis and develop therapeutic strategies<strong>.</strong></div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"108 ","pages":"Article 107910"},"PeriodicalIF":4.9000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Signal Processing and Control","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1746809425004215","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Breast cancer is one of the leading causes of death worldwide. Despite advances in treatment, its increasing prevalence is a serious concern. Peripheral blood mononuclear cells (PBMCs) undergo gene expression changes when interacting with tumors and can be considered as promising biomarkers for early detection. This study aimed to identify potential biomarkers for breast cancer using explainable artificial intelligence (XAI) and machine learning models. Two datasets, GSE27562 and GSE47862, included healthy individuals and breast cancer patients. After careful preprocessing and data fusion, several machine learning models, including AdaBoost, XGBoost, Random Forest, and Decision Tree, were tested. The AdaBoost model achieved the highest accuracy of 98%. Using SHAP values, ten key genes that had the greatest impact on the model prediction were identified: MRPL3, SLC36A4, COMT, HAAO, KCTD10, FCHO1, RND2, RBM7, LBX1, and LTB4R. Pathway and functional analysis showed that these genes are involved in important processes such as protein metabolism and signal transduction and have high potential as biomarkers. Survival analysis was used to investigate the role of these genes in breast cancer prognosis, and Protein–Protein Interaction (PPI) analysis provided insights into the relationship and gene interaction networks. The findings of this study emphasize the high importance of PBMCs as a non-invasive tool for breast cancer prognosis and indicate that, given the high accuracy, interpretability, and potential of this method in clinical application, it can be used to transform cancer prognosis and develop therapeutic strategies.
期刊介绍:
Biomedical Signal Processing and Control aims to provide a cross-disciplinary international forum for the interchange of information on research in the measurement and analysis of signals and images in clinical medicine and the biological sciences. Emphasis is placed on contributions dealing with the practical, applications-led research on the use of methods and devices in clinical diagnosis, patient monitoring and management.
Biomedical Signal Processing and Control reflects the main areas in which these methods are being used and developed at the interface of both engineering and clinical science. The scope of the journal is defined to include relevant review papers, technical notes, short communications and letters. Tutorial papers and special issues will also be published.