Cost-Efficient Early Diagnostic Tool for Lung Cancer: Explainable AI in Clinical Systems.

IF 2.8 4区 医学 Q3 ONCOLOGY
Technology in Cancer Research & Treatment Pub Date : 2025-01-01 Epub Date: 2025-08-14 DOI:10.1177/15330338251370239
Anu Maria Sebastian, David Peter, T P Rajagopal, Rinu Ann Sebastian
{"title":"Cost-Efficient Early Diagnostic Tool for Lung Cancer: Explainable AI in Clinical Systems.","authors":"Anu Maria Sebastian, David Peter, T P Rajagopal, Rinu Ann Sebastian","doi":"10.1177/15330338251370239","DOIUrl":null,"url":null,"abstract":"<p><p>IntroductionLung cancer has the highest mortality rate among all cancer types globally, largely due to delayed or ineffective diagnosis and treatment. Radiomics is commonly used to diagnose lung cancer, especially in later stages or during routine screenings. However, frequent radiological imaging poses health risks, and while advanced diagnostic alternatives exist, they are often costly and accessible only to a limited, privileged population. Leveraging clinical data using machine learning (ML) and artificial intelligence (AI) enables a safer, more inclusive, and affordable solution. Due to a lack of interpretability, AI-based models for cancer diagnosis are less adopted by clinicians.MethodsThis study introduces a safe, inclusive, and cost-effective lung cancer diagnostic method using an explainable AI (XAI) model built on routine clinical data. It employs a stacking ensemble of Artificial Neural Network (ANN) and Deep Neural Network (DNN) to match the diagnostic performance of clean-data DNN models. By incorporating rare medical cases through Adaptive Synthetic Sampling (ADASYN), the model reduces the risk of missing challenging, rare-case diagnoses.ResultsThe proposed XAI model demonstrates strong performance with an accuracy of 0.8558, AUC of 0.8600, precision of 0.8092, recall of 0.9282, and F1-score of 0.8646, notably improving rare case detection by over 50%. SHapley additive exPlanations(SHAP)-based interpretability highlights Erythrocyte sedimentation rate(ESR), intoxication-related factors, hemoglobin levels, and neutrophil counts as key features. The model also reveals associations, such as a link between heavy tobacco use and elevated ESR. Counterfactual explanations help identify features contributing to misdiagnoses by exposing sources of confusion in the model's decisions.ConclusionGiven the limited dataset size and geographic constraints, this research should be viewed as a prototype and in its current form, the model is best suited as a pre-screening tool to support early detection. With training on larger and more diverse datasets, the model has strong potential to evolve into a robust and scalable diagnostic solution.</p>","PeriodicalId":22203,"journal":{"name":"Technology in Cancer Research & Treatment","volume":"24 ","pages":"15330338251370239"},"PeriodicalIF":2.8000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12357035/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Technology in Cancer Research & Treatment","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/15330338251370239","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/14 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

IntroductionLung cancer has the highest mortality rate among all cancer types globally, largely due to delayed or ineffective diagnosis and treatment. Radiomics is commonly used to diagnose lung cancer, especially in later stages or during routine screenings. However, frequent radiological imaging poses health risks, and while advanced diagnostic alternatives exist, they are often costly and accessible only to a limited, privileged population. Leveraging clinical data using machine learning (ML) and artificial intelligence (AI) enables a safer, more inclusive, and affordable solution. Due to a lack of interpretability, AI-based models for cancer diagnosis are less adopted by clinicians.MethodsThis study introduces a safe, inclusive, and cost-effective lung cancer diagnostic method using an explainable AI (XAI) model built on routine clinical data. It employs a stacking ensemble of Artificial Neural Network (ANN) and Deep Neural Network (DNN) to match the diagnostic performance of clean-data DNN models. By incorporating rare medical cases through Adaptive Synthetic Sampling (ADASYN), the model reduces the risk of missing challenging, rare-case diagnoses.ResultsThe proposed XAI model demonstrates strong performance with an accuracy of 0.8558, AUC of 0.8600, precision of 0.8092, recall of 0.9282, and F1-score of 0.8646, notably improving rare case detection by over 50%. SHapley additive exPlanations(SHAP)-based interpretability highlights Erythrocyte sedimentation rate(ESR), intoxication-related factors, hemoglobin levels, and neutrophil counts as key features. The model also reveals associations, such as a link between heavy tobacco use and elevated ESR. Counterfactual explanations help identify features contributing to misdiagnoses by exposing sources of confusion in the model's decisions.ConclusionGiven the limited dataset size and geographic constraints, this research should be viewed as a prototype and in its current form, the model is best suited as a pre-screening tool to support early detection. With training on larger and more diverse datasets, the model has strong potential to evolve into a robust and scalable diagnostic solution.

Abstract Image

Abstract Image

Abstract Image

具有成本效益的肺癌早期诊断工具:临床系统中可解释的人工智能。
在全球所有癌症类型中,肺癌的死亡率最高,主要原因是诊断和治疗延迟或无效。放射组学通常用于诊断肺癌,特别是在晚期或常规筛查期间。然而,频繁的放射成像构成健康风险,虽然存在先进的诊断替代方法,但它们往往价格昂贵,而且只有少数特权人群才能获得。利用机器学习(ML)和人工智能(AI)利用临床数据,可以实现更安全、更具包容性和更经济的解决方案。由于缺乏可解释性,临床医生很少采用基于人工智能的癌症诊断模型。方法本研究采用基于常规临床数据的可解释人工智能(XAI)模型,介绍了一种安全、包容、经济的肺癌诊断方法。它采用人工神经网络(ANN)和深度神经网络(DNN)的叠加集成来匹配干净数据DNN模型的诊断性能。通过自适应合成采样(ADASYN)纳入罕见病例,该模型降低了错过具有挑战性的罕见病例诊断的风险。结果所建立的XAI模型准确率为0.8558,AUC为0.8600,精密度为0.8092,召回率为0.9282,f1评分为0.8646,显著提高了50%以上的罕见病例检出率。基于SHapley加法解释(SHAP)的可解释性强调了红细胞沉降率(ESR)、中毒相关因素、血红蛋白水平和中性粒细胞计数作为关键特征。该模型还揭示了一些关联,比如重度烟草使用与ESR升高之间的联系。反事实解释通过暴露模型决策中的混淆来源,帮助识别导致误诊的特征。鉴于有限的数据集大小和地理限制,本研究应被视为一个原型,以其目前的形式,该模型最适合作为支持早期检测的预筛选工具。通过在更大、更多样化的数据集上进行训练,该模型具有强大的潜力,可以发展成为一种健壮且可扩展的诊断解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.40
自引率
0.00%
发文量
202
审稿时长
2 months
期刊介绍: Technology in Cancer Research & Treatment (TCRT) is a JCR-ranked, broad-spectrum, open access, peer-reviewed publication whose aim is to provide researchers and clinicians with a platform to share and discuss developments in the prevention, diagnosis, treatment, and monitoring of cancer.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信