机器学习对头颈部鳞状细胞癌存活结果的解释性

IF 3.7 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Rasheed Omobolaji Alabi , Antti A. Mäkitie , Mohammed Elmusrati , Alhadi Almangush , Ylva Tiblom Ehrsson , Göran Laurell
{"title":"机器学习对头颈部鳞状细胞癌存活结果的解释性","authors":"Rasheed Omobolaji Alabi ,&nbsp;Antti A. Mäkitie ,&nbsp;Mohammed Elmusrati ,&nbsp;Alhadi Almangush ,&nbsp;Ylva Tiblom Ehrsson ,&nbsp;Göran Laurell","doi":"10.1016/j.ijmedinf.2025.105873","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Diagnosis and treatment of head and neck squamous cell carcinoma (HNSCC) induces psychological variables and treatment-related toxicity in patients. The evaluation of outcomes is warranted for effective treatment planning and improved disease management. <strong>Objectives</strong>: This study aimed to build a prognostic system by combining clinicopathological parameters, treatment-related factors, and sociodemographic factors as integrative inputs to build a machine learning (ML) model to estimate the overall survival (OS) of patients with HNSCC. Furthermore, we explored the complementary prognostic potentials of these input parameters. We provide explainability and interpretability using Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) techniques. <strong>Methods:</strong> A total of 419 patients with HNSCC were recruited from three University Hospitals in Sweden. We compared the performance of TabNet, a state-of-the-art deep learning algorithm for tabular data, with extreme gradient boosting (XGBoost) and voting ensemble to predict OS in patients with HNSCC. <strong>Results:</strong> Both TabNet and XGBoost showed comparable performance accuracies, with TabNet and XGBoost showing a performance accuracy of 88.1% each and voting ensemble showing an accuracy of 88.7%. The aggregate feature importance showed that p16 (a tumor suppressor protein that plays a crucial role in cell cycle regulation), cancer stage, hemoglobin, age at diagnosis, T class, N class, smoking pack-years, body mass index (BMI), treatment modality, erythrocyte count, and human papillomavirus (HPV) status were the most important parameters for the predictive ability of the model for OS. Furthermore, we found survival trends in this cohort by individually considering parameters such as p16, cancer stage, hemoglobin, age at diagnosis, HPV status, Tumor Nodal Metastasis staging, and socioeconomic factors (marital status, housing, and level of education). In addition, both the LIME and SHAP techniques showed the contribution of each feature to the prediction made by the model. <strong>Conclusions:</strong> The clinical implementation of an ML model can lead to individualized risk-based therapeutic decision-making. Therefore, validating these models with multi-institutional datasets and testing them in the context of clinical trials is warranted for safe clinical implementation.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"199 ","pages":"Article 105873"},"PeriodicalIF":3.7000,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning explainability for survival outcome in head and neck squamous cell carcinoma\",\"authors\":\"Rasheed Omobolaji Alabi ,&nbsp;Antti A. Mäkitie ,&nbsp;Mohammed Elmusrati ,&nbsp;Alhadi Almangush ,&nbsp;Ylva Tiblom Ehrsson ,&nbsp;Göran Laurell\",\"doi\":\"10.1016/j.ijmedinf.2025.105873\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Diagnosis and treatment of head and neck squamous cell carcinoma (HNSCC) induces psychological variables and treatment-related toxicity in patients. The evaluation of outcomes is warranted for effective treatment planning and improved disease management. <strong>Objectives</strong>: This study aimed to build a prognostic system by combining clinicopathological parameters, treatment-related factors, and sociodemographic factors as integrative inputs to build a machine learning (ML) model to estimate the overall survival (OS) of patients with HNSCC. Furthermore, we explored the complementary prognostic potentials of these input parameters. We provide explainability and interpretability using Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) techniques. <strong>Methods:</strong> A total of 419 patients with HNSCC were recruited from three University Hospitals in Sweden. We compared the performance of TabNet, a state-of-the-art deep learning algorithm for tabular data, with extreme gradient boosting (XGBoost) and voting ensemble to predict OS in patients with HNSCC. <strong>Results:</strong> Both TabNet and XGBoost showed comparable performance accuracies, with TabNet and XGBoost showing a performance accuracy of 88.1% each and voting ensemble showing an accuracy of 88.7%. The aggregate feature importance showed that p16 (a tumor suppressor protein that plays a crucial role in cell cycle regulation), cancer stage, hemoglobin, age at diagnosis, T class, N class, smoking pack-years, body mass index (BMI), treatment modality, erythrocyte count, and human papillomavirus (HPV) status were the most important parameters for the predictive ability of the model for OS. Furthermore, we found survival trends in this cohort by individually considering parameters such as p16, cancer stage, hemoglobin, age at diagnosis, HPV status, Tumor Nodal Metastasis staging, and socioeconomic factors (marital status, housing, and level of education). In addition, both the LIME and SHAP techniques showed the contribution of each feature to the prediction made by the model. <strong>Conclusions:</strong> The clinical implementation of an ML model can lead to individualized risk-based therapeutic decision-making. Therefore, validating these models with multi-institutional datasets and testing them in the context of clinical trials is warranted for safe clinical implementation.</div></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":\"199 \",\"pages\":\"Article 105873\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-03-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505625000905\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505625000905","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

背景头颈部鳞状细胞癌(HNSCC)的诊断和治疗会引起患者的心理变量和治疗相关的毒性。结果评估是有效的治疗计划和改善疾病管理的保证。目的:本研究旨在通过将临床病理参数、治疗相关因素、社会人口学因素作为综合输入,构建预测HNSCC患者总生存期(OS)的机器学习(ML)模型。此外,我们探索了这些输入参数的互补预后潜力。我们使用局部可解释模型不可知论解释(LIME)和SHapley加性解释(SHAP)技术提供可解释性和可解释性。方法:从瑞典三所大学医院共招募了419例HNSCC患者。我们比较了TabNet的性能,TabNet是一种最先进的表格数据深度学习算法,与极端梯度增强(XGBoost)和投票集合来预测HNSCC患者的OS。结果:TabNet和XGBoost都显示出相当的性能准确性,TabNet和XGBoost分别显示出88.1%的性能准确性,投票集合显示出88.7%的准确性。综合特征重要性显示,p16(一种在细胞周期调节中起关键作用的肿瘤抑制蛋白)、癌症分期、血红蛋白、诊断时年龄、T类、N类、吸烟包年、体重指数(BMI)、治疗方式、红细胞计数和人乳头瘤病毒(HPV)状态是该模型预测OS能力的最重要参数。此外,通过单独考虑p16、癌症分期、血红蛋白、诊断年龄、HPV状态、肿瘤结转移分期和社会经济因素(婚姻状况、住房和教育水平)等参数,我们发现了该队列的生存趋势。此外,LIME和SHAP技术都显示了每个特征对模型预测的贡献。结论:ML模型的临床实施可以导致个体化的基于风险的治疗决策。因此,用多机构数据集验证这些模型,并在临床试验的背景下对它们进行测试,是安全的临床实施的保证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Machine learning explainability for survival outcome in head and neck squamous cell carcinoma

Background

Diagnosis and treatment of head and neck squamous cell carcinoma (HNSCC) induces psychological variables and treatment-related toxicity in patients. The evaluation of outcomes is warranted for effective treatment planning and improved disease management. Objectives: This study aimed to build a prognostic system by combining clinicopathological parameters, treatment-related factors, and sociodemographic factors as integrative inputs to build a machine learning (ML) model to estimate the overall survival (OS) of patients with HNSCC. Furthermore, we explored the complementary prognostic potentials of these input parameters. We provide explainability and interpretability using Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) techniques. Methods: A total of 419 patients with HNSCC were recruited from three University Hospitals in Sweden. We compared the performance of TabNet, a state-of-the-art deep learning algorithm for tabular data, with extreme gradient boosting (XGBoost) and voting ensemble to predict OS in patients with HNSCC. Results: Both TabNet and XGBoost showed comparable performance accuracies, with TabNet and XGBoost showing a performance accuracy of 88.1% each and voting ensemble showing an accuracy of 88.7%. The aggregate feature importance showed that p16 (a tumor suppressor protein that plays a crucial role in cell cycle regulation), cancer stage, hemoglobin, age at diagnosis, T class, N class, smoking pack-years, body mass index (BMI), treatment modality, erythrocyte count, and human papillomavirus (HPV) status were the most important parameters for the predictive ability of the model for OS. Furthermore, we found survival trends in this cohort by individually considering parameters such as p16, cancer stage, hemoglobin, age at diagnosis, HPV status, Tumor Nodal Metastasis staging, and socioeconomic factors (marital status, housing, and level of education). In addition, both the LIME and SHAP techniques showed the contribution of each feature to the prediction made by the model. Conclusions: The clinical implementation of an ML model can lead to individualized risk-based therapeutic decision-making. Therefore, validating these models with multi-institutional datasets and testing them in the context of clinical trials is warranted for safe clinical implementation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Medical Informatics
International Journal of Medical Informatics 医学-计算机:信息系统
CiteScore
8.90
自引率
4.10%
发文量
217
审稿时长
42 days
期刊介绍: International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings. The scope of journal covers: Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.; Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc. Educational computer based programs pertaining to medical informatics or medicine in general; Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信