Rasheed Omobolaji Alabi , Antti A. Mäkitie , Mohammed Elmusrati , Alhadi Almangush , Ylva Tiblom Ehrsson , Göran Laurell
{"title":"机器学习对头颈部鳞状细胞癌存活结果的解释性","authors":"Rasheed Omobolaji Alabi , Antti A. Mäkitie , Mohammed Elmusrati , Alhadi Almangush , Ylva Tiblom Ehrsson , Göran Laurell","doi":"10.1016/j.ijmedinf.2025.105873","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Diagnosis and treatment of head and neck squamous cell carcinoma (HNSCC) induces psychological variables and treatment-related toxicity in patients. The evaluation of outcomes is warranted for effective treatment planning and improved disease management. <strong>Objectives</strong>: This study aimed to build a prognostic system by combining clinicopathological parameters, treatment-related factors, and sociodemographic factors as integrative inputs to build a machine learning (ML) model to estimate the overall survival (OS) of patients with HNSCC. Furthermore, we explored the complementary prognostic potentials of these input parameters. We provide explainability and interpretability using Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) techniques. <strong>Methods:</strong> A total of 419 patients with HNSCC were recruited from three University Hospitals in Sweden. We compared the performance of TabNet, a state-of-the-art deep learning algorithm for tabular data, with extreme gradient boosting (XGBoost) and voting ensemble to predict OS in patients with HNSCC. <strong>Results:</strong> Both TabNet and XGBoost showed comparable performance accuracies, with TabNet and XGBoost showing a performance accuracy of 88.1% each and voting ensemble showing an accuracy of 88.7%. The aggregate feature importance showed that p16 (a tumor suppressor protein that plays a crucial role in cell cycle regulation), cancer stage, hemoglobin, age at diagnosis, T class, N class, smoking pack-years, body mass index (BMI), treatment modality, erythrocyte count, and human papillomavirus (HPV) status were the most important parameters for the predictive ability of the model for OS. Furthermore, we found survival trends in this cohort by individually considering parameters such as p16, cancer stage, hemoglobin, age at diagnosis, HPV status, Tumor Nodal Metastasis staging, and socioeconomic factors (marital status, housing, and level of education). In addition, both the LIME and SHAP techniques showed the contribution of each feature to the prediction made by the model. <strong>Conclusions:</strong> The clinical implementation of an ML model can lead to individualized risk-based therapeutic decision-making. Therefore, validating these models with multi-institutional datasets and testing them in the context of clinical trials is warranted for safe clinical implementation.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"199 ","pages":"Article 105873"},"PeriodicalIF":3.7000,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning explainability for survival outcome in head and neck squamous cell carcinoma\",\"authors\":\"Rasheed Omobolaji Alabi , Antti A. Mäkitie , Mohammed Elmusrati , Alhadi Almangush , Ylva Tiblom Ehrsson , Göran Laurell\",\"doi\":\"10.1016/j.ijmedinf.2025.105873\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Diagnosis and treatment of head and neck squamous cell carcinoma (HNSCC) induces psychological variables and treatment-related toxicity in patients. The evaluation of outcomes is warranted for effective treatment planning and improved disease management. <strong>Objectives</strong>: This study aimed to build a prognostic system by combining clinicopathological parameters, treatment-related factors, and sociodemographic factors as integrative inputs to build a machine learning (ML) model to estimate the overall survival (OS) of patients with HNSCC. Furthermore, we explored the complementary prognostic potentials of these input parameters. We provide explainability and interpretability using Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) techniques. <strong>Methods:</strong> A total of 419 patients with HNSCC were recruited from three University Hospitals in Sweden. We compared the performance of TabNet, a state-of-the-art deep learning algorithm for tabular data, with extreme gradient boosting (XGBoost) and voting ensemble to predict OS in patients with HNSCC. <strong>Results:</strong> Both TabNet and XGBoost showed comparable performance accuracies, with TabNet and XGBoost showing a performance accuracy of 88.1% each and voting ensemble showing an accuracy of 88.7%. The aggregate feature importance showed that p16 (a tumor suppressor protein that plays a crucial role in cell cycle regulation), cancer stage, hemoglobin, age at diagnosis, T class, N class, smoking pack-years, body mass index (BMI), treatment modality, erythrocyte count, and human papillomavirus (HPV) status were the most important parameters for the predictive ability of the model for OS. Furthermore, we found survival trends in this cohort by individually considering parameters such as p16, cancer stage, hemoglobin, age at diagnosis, HPV status, Tumor Nodal Metastasis staging, and socioeconomic factors (marital status, housing, and level of education). In addition, both the LIME and SHAP techniques showed the contribution of each feature to the prediction made by the model. <strong>Conclusions:</strong> The clinical implementation of an ML model can lead to individualized risk-based therapeutic decision-making. Therefore, validating these models with multi-institutional datasets and testing them in the context of clinical trials is warranted for safe clinical implementation.</div></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":\"199 \",\"pages\":\"Article 105873\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-03-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505625000905\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505625000905","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Machine learning explainability for survival outcome in head and neck squamous cell carcinoma
Background
Diagnosis and treatment of head and neck squamous cell carcinoma (HNSCC) induces psychological variables and treatment-related toxicity in patients. The evaluation of outcomes is warranted for effective treatment planning and improved disease management. Objectives: This study aimed to build a prognostic system by combining clinicopathological parameters, treatment-related factors, and sociodemographic factors as integrative inputs to build a machine learning (ML) model to estimate the overall survival (OS) of patients with HNSCC. Furthermore, we explored the complementary prognostic potentials of these input parameters. We provide explainability and interpretability using Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) techniques. Methods: A total of 419 patients with HNSCC were recruited from three University Hospitals in Sweden. We compared the performance of TabNet, a state-of-the-art deep learning algorithm for tabular data, with extreme gradient boosting (XGBoost) and voting ensemble to predict OS in patients with HNSCC. Results: Both TabNet and XGBoost showed comparable performance accuracies, with TabNet and XGBoost showing a performance accuracy of 88.1% each and voting ensemble showing an accuracy of 88.7%. The aggregate feature importance showed that p16 (a tumor suppressor protein that plays a crucial role in cell cycle regulation), cancer stage, hemoglobin, age at diagnosis, T class, N class, smoking pack-years, body mass index (BMI), treatment modality, erythrocyte count, and human papillomavirus (HPV) status were the most important parameters for the predictive ability of the model for OS. Furthermore, we found survival trends in this cohort by individually considering parameters such as p16, cancer stage, hemoglobin, age at diagnosis, HPV status, Tumor Nodal Metastasis staging, and socioeconomic factors (marital status, housing, and level of education). In addition, both the LIME and SHAP techniques showed the contribution of each feature to the prediction made by the model. Conclusions: The clinical implementation of an ML model can lead to individualized risk-based therapeutic decision-making. Therefore, validating these models with multi-institutional datasets and testing them in the context of clinical trials is warranted for safe clinical implementation.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.