Cost-Efficient Early Diagnostic Tool for Lung Cancer: Explainable AI in Clinical Systems.

IF 2.8 4区医学 Q3 ONCOLOGY

Technology in Cancer Research & Treatment Pub Date : 2025-01-01 Epub Date: 2025-08-14 DOI:10.1177/15330338251370239

Anu Maria Sebastian, David Peter, T P Rajagopal, Rinu Ann Sebastian

{"title":"Cost-Efficient Early Diagnostic Tool for Lung Cancer: Explainable AI in Clinical Systems.","authors":"Anu Maria Sebastian, David Peter, T P Rajagopal, Rinu Ann Sebastian","doi":"10.1177/15330338251370239","DOIUrl":null,"url":null,"abstract":"<p><p>IntroductionLung cancer has the highest mortality rate among all cancer types globally, largely due to delayed or ineffective diagnosis and treatment. Radiomics is commonly used to diagnose lung cancer, especially in later stages or during routine screenings. However, frequent radiological imaging poses health risks, and while advanced diagnostic alternatives exist, they are often costly and accessible only to a limited, privileged population. Leveraging clinical data using machine learning (ML) and artificial intelligence (AI) enables a safer, more inclusive, and affordable solution. Due to a lack of interpretability, AI-based models for cancer diagnosis are less adopted by clinicians.MethodsThis study introduces a safe, inclusive, and cost-effective lung cancer diagnostic method using an explainable AI (XAI) model built on routine clinical data. It employs a stacking ensemble of Artificial Neural Network (ANN) and Deep Neural Network (DNN) to match the diagnostic performance of clean-data DNN models. By incorporating rare medical cases through Adaptive Synthetic Sampling (ADASYN), the model reduces the risk of missing challenging, rare-case diagnoses.ResultsThe proposed XAI model demonstrates strong performance with an accuracy of 0.8558, AUC of 0.8600, precision of 0.8092, recall of 0.9282, and F1-score of 0.8646, notably improving rare case detection by over 50%. SHapley additive exPlanations(SHAP)-based interpretability highlights Erythrocyte sedimentation rate(ESR), intoxication-related factors, hemoglobin levels, and neutrophil counts as key features. The model also reveals associations, such as a link between heavy tobacco use and elevated ESR. Counterfactual explanations help identify features contributing to misdiagnoses by exposing sources of confusion in the model's decisions.ConclusionGiven the limited dataset size and geographic constraints, this research should be viewed as a prototype and in its current form, the model is best suited as a pre-screening tool to support early detection. With training on larger and more diverse datasets, the model has strong potential to evolve into a robust and scalable diagnostic solution.</p>","PeriodicalId":22203,"journal":{"name":"Technology in Cancer Research & Treatment","volume":"24 ","pages":"15330338251370239"},"PeriodicalIF":2.8000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12357035/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Technology in Cancer Research & Treatment","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/15330338251370239","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/14 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

IntroductionLung cancer has the highest mortality rate among all cancer types globally, largely due to delayed or ineffective diagnosis and treatment. Radiomics is commonly used to diagnose lung cancer, especially in later stages or during routine screenings. However, frequent radiological imaging poses health risks, and while advanced diagnostic alternatives exist, they are often costly and accessible only to a limited, privileged population. Leveraging clinical data using machine learning (ML) and artificial intelligence (AI) enables a safer, more inclusive, and affordable solution. Due to a lack of interpretability, AI-based models for cancer diagnosis are less adopted by clinicians.MethodsThis study introduces a safe, inclusive, and cost-effective lung cancer diagnostic method using an explainable AI (XAI) model built on routine clinical data. It employs a stacking ensemble of Artificial Neural Network (ANN) and Deep Neural Network (DNN) to match the diagnostic performance of clean-data DNN models. By incorporating rare medical cases through Adaptive Synthetic Sampling (ADASYN), the model reduces the risk of missing challenging, rare-case diagnoses.ResultsThe proposed XAI model demonstrates strong performance with an accuracy of 0.8558, AUC of 0.8600, precision of 0.8092, recall of 0.9282, and F1-score of 0.8646, notably improving rare case detection by over 50%. SHapley additive exPlanations(SHAP)-based interpretability highlights Erythrocyte sedimentation rate(ESR), intoxication-related factors, hemoglobin levels, and neutrophil counts as key features. The model also reveals associations, such as a link between heavy tobacco use and elevated ESR. Counterfactual explanations help identify features contributing to misdiagnoses by exposing sources of confusion in the model's decisions.ConclusionGiven the limited dataset size and geographic constraints, this research should be viewed as a prototype and in its current form, the model is best suited as a pre-screening tool to support early detection. With training on larger and more diverse datasets, the model has strong potential to evolve into a robust and scalable diagnostic solution.

Abstract Image

查看原文本刊更多论文

具有成本效益的肺癌早期诊断工具：临床系统中可解释的人工智能。

在全球所有癌症类型中，肺癌的死亡率最高，主要原因是诊断和治疗延迟或无效。放射组学通常用于诊断肺癌，特别是在晚期或常规筛查期间。然而，频繁的放射成像构成健康风险，虽然存在先进的诊断替代方法，但它们往往价格昂贵，而且只有少数特权人群才能获得。利用机器学习（ML）和人工智能（AI）利用临床数据，可以实现更安全、更具包容性和更经济的解决方案。由于缺乏可解释性，临床医生很少采用基于人工智能的癌症诊断模型。方法本研究采用基于常规临床数据的可解释人工智能（XAI）模型，介绍了一种安全、包容、经济的肺癌诊断方法。它采用人工神经网络（ANN）和深度神经网络（DNN）的叠加集成来匹配干净数据DNN模型的诊断性能。通过自适应合成采样（ADASYN）纳入罕见病例，该模型降低了错过具有挑战性的罕见病例诊断的风险。结果所建立的XAI模型准确率为0.8558，AUC为0.8600，精密度为0.8092，召回率为0.9282，f1评分为0.8646，显著提高了50%以上的罕见病例检出率。基于SHapley加法解释（SHAP）的可解释性强调了红细胞沉降率（ESR）、中毒相关因素、血红蛋白水平和中性粒细胞计数作为关键特征。该模型还揭示了一些关联，比如重度烟草使用与ESR升高之间的联系。反事实解释通过暴露模型决策中的混淆来源，帮助识别导致误诊的特征。鉴于有限的数据集大小和地理限制，本研究应被视为一个原型，以其目前的形式，该模型最适合作为支持早期检测的预筛选工具。通过在更大、更多样化的数据集上进行训练，该模型具有强大的潜力，可以发展成为一种健壮且可扩展的诊断解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Technology in Cancer Research & Treatment 医学-肿瘤学

CiteScore

4.40

自引率

0.00%

发文量

202

审稿时长

2 months

期刊介绍： Technology in Cancer Research & Treatment (TCRT) is a JCR-ranked, broad-spectrum, open access, peer-reviewed publication whose aim is to provide researchers and clinicians with a platform to share and discuss developments in the prevention, diagnosis, treatment, and monitoring of cancer.