Explainable AI-driven analysis of radiology reports using text and image data: An experimental study.

IF 2 Q3 HEALTH CARE SCIENCES & SERVICES

JMIR Formative Research Pub Date : 2025-09-22 DOI:10.2196/77482

Muhammad Tayyab Zamir, Safir Ullah Khan, Alexander Gelbukh, Edgardo Manuel Felipe Riverón, Irina Gelbukh

{"title":"Explainable AI-driven analysis of radiology reports using text and image data: An experimental study.","authors":"Muhammad Tayyab Zamir, Safir Ullah Khan, Alexander Gelbukh, Edgardo Manuel Felipe Riverón, Irina Gelbukh","doi":"10.2196/77482","DOIUrl":null,"url":null,"abstract":"Background: Artificial intelligence is increasingly being integrated into clinical diagnostics, yet its lack of transparency hinders trust and adoption among healthcare professionals. The explainable AI (XAI) has the potential to improve interpretability and reliability of AI-based decisions in clinical practice.Objective: This study evaluates the use of Explainable AI (XAI) for interpreting radiology reports to improve healthcare practitioners' confidence and comprehension of AI-assisted diagnostics.Methods: This study employed the Indiana University chest X-ray Dataset containing 3169 textual reports and 6471 images. Textual were being classified as either normal or abnormal by using a range of machine learning approaches. This includes traditional machine learning models and ensemble methods, deep learning models (LSTM), and advanced transformer-based language models (GPT-2, T5, LLaMA-2, LLaMA-3.1). For image-based classifications, convolution neural networks (CNNs) including DenseNet121, and DenseNet169 were used. Top performing models were interpreted using Explainable AI (XAI) methods SHAP and LIME to support clinical decision making by enhancing transparency and trust in model predictions.Results: LLaMA-3.1 model achieved highest accuracy of 98% in classifying the textual radiology reports. Statistical analysis confirmed the model robustness, with Cohen's kappa (k=0.981) indicating near perfect agreement beyond chance, both Chi-Square and Fisher's Exact test revealing a high significant association between actual and predicted labels (p<0.0001). Although McNemar's Test yielded a non-significant result (p=0.25) suggests balance class performance. While the highest accuracy of 84% was achieved in the analysis of imaging data using the DenseNet169 and DenseNet121 models. To assess explainability, LIME and SHAP were applied to best performing models. These models consistently highlighted the medical related terms such as \"opacity\", \"consolidation\" and \"pleural\" are clear indication for abnormal finding in textual reports.Conclusions: The research underscores that explainability is an essential component of any AI systems used in diagnostics and helpful in the design and implementation of AI in the healthcare sector. Such approach improves the accuracy of the diagnosis and builds confidence in health workers, who in the future will use explainable AI in clinical settings, particularly in the application of AI explainability for medical purposes.","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":" ","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Formative Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/77482","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Artificial intelligence is increasingly being integrated into clinical diagnostics, yet its lack of transparency hinders trust and adoption among healthcare professionals. The explainable AI (XAI) has the potential to improve interpretability and reliability of AI-based decisions in clinical practice.

Objective: This study evaluates the use of Explainable AI (XAI) for interpreting radiology reports to improve healthcare practitioners' confidence and comprehension of AI-assisted diagnostics.

Methods: This study employed the Indiana University chest X-ray Dataset containing 3169 textual reports and 6471 images. Textual were being classified as either normal or abnormal by using a range of machine learning approaches. This includes traditional machine learning models and ensemble methods, deep learning models (LSTM), and advanced transformer-based language models (GPT-2, T5, LLaMA-2, LLaMA-3.1). For image-based classifications, convolution neural networks (CNNs) including DenseNet121, and DenseNet169 were used. Top performing models were interpreted using Explainable AI (XAI) methods SHAP and LIME to support clinical decision making by enhancing transparency and trust in model predictions.

Results: LLaMA-3.1 model achieved highest accuracy of 98% in classifying the textual radiology reports. Statistical analysis confirmed the model robustness, with Cohen's kappa (k=0.981) indicating near perfect agreement beyond chance, both Chi-Square and Fisher's Exact test revealing a high significant association between actual and predicted labels (p<0.0001). Although McNemar's Test yielded a non-significant result (p=0.25) suggests balance class performance. While the highest accuracy of 84% was achieved in the analysis of imaging data using the DenseNet169 and DenseNet121 models. To assess explainability, LIME and SHAP were applied to best performing models. These models consistently highlighted the medical related terms such as "opacity", "consolidation" and "pleural" are clear indication for abnormal finding in textual reports.

Conclusions: The research underscores that explainability is an essential component of any AI systems used in diagnostics and helpful in the design and implementation of AI in the healthcare sector. Such approach improves the accuracy of the diagnosis and builds confidence in health workers, who in the future will use explainable AI in clinical settings, particularly in the application of AI explainability for medical purposes.

查看原文本刊更多论文

使用文本和图像数据对放射学报告进行可解释的人工智能驱动分析：一项实验研究。

背景：人工智能正越来越多地集成到临床诊断中，但其缺乏透明度阻碍了医疗保健专业人员的信任和采用。可解释的人工智能（XAI）有潜力提高临床实践中基于人工智能的决策的可解释性和可靠性。目的：本研究评估可解释人工智能（Explainable AI， XAI）在解释放射学报告中的应用，以提高医护人员对人工智能辅助诊断的信心和理解。方法：本研究使用了包含3169篇文本报告和6471张图像的印第安纳大学胸部x射线数据集。通过使用一系列机器学习方法，文本被分类为正常或异常。这包括传统的机器学习模型和集成方法、深度学习模型（LSTM）和先进的基于变压器的语言模型（GPT-2、T5、LLaMA-2、LLaMA-3.1）。对于基于图像的分类，使用卷积神经网络（cnn），包括DenseNet121和DenseNet169。使用可解释人工智能（XAI）方法SHAP和LIME解释表现最好的模型，通过提高模型预测的透明度和信任度来支持临床决策。结果：LLaMA-3.1模型对影像学报告文本的分类准确率最高，达到98%。统计分析证实了模型的稳健性，Cohen的kappa （k=0.981）表明接近完美的一致性，超出了偶然，卡方检验和Fisher的精确检验都揭示了实际和预测标签之间的高度显著关联(结论：研究强调了可解释性是用于诊断的任何人工智能系统的重要组成部分，有助于人工智能在医疗保健领域的设计和实施。这种方法提高了诊断的准确性，并建立了对卫生工作者的信心，未来卫生工作者将在临床环境中使用可解释的人工智能，特别是在将人工智能可解释性应用于医疗目的方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊