The Hidden Threat of Hallucinations in Binary Chest X-ray Pneumonia Classification.

Proceedings. IEEE International Symposium on Computer-Based Medical Systems Pub Date : 2025-06-01 Epub Date: 2025-07-04 DOI:10.1109/cbms65348.2025.00138

Sivaramakrishnan Rajaraman, Zhaohui Liang, Niccolo Marini, Zhiyun Xue, Sameer Antani

{"title":"The Hidden Threat of Hallucinations in Binary Chest X-ray Pneumonia Classification.","authors":"Sivaramakrishnan Rajaraman, Zhaohui Liang, Niccolo Marini, Zhiyun Xue, Sameer Antani","doi":"10.1109/cbms65348.2025.00138","DOIUrl":null,"url":null,"abstract":"<p><p>Hallucination in deep learning (DL) classification, where DL models yield confidently erroneous predictions remains a pressing concern. This study investigates whether binary classifiers are truly learning disease-specific features when distinguishing overlapping radiological presentations among pneumonia subtypes on chest X-ray (CXR) images. Specifically, we evaluate if uncertainty measure is a valuable tool in classifying signs of different pathogen-specific subtypes of pneumonia. We evaluated two binary classifiers to classify bacterial pneumonia and viral pneumonia, respectively, from normal CXRs. A third classifier explored the ability to distinguish bacterial from viral pneumonia presentation to highlight our concern regarding the observed hallucinations in the former cases. Our comprehensive analysis computes the Matthews Correlation Coefficient and prediction entropy metrics on a pediatric CXR dataset and reveals that the normal/bacterial and normal/viral classifiers consistently and confidently misclassify the unseen pneumonia subtype to their respective disease class. These findings expose a critical limitation concerning the tendency of binary classifiers to hallucinate by relying on general pneumonia indicators rather than pathogen-specific patterns, thereby challenging their utility in clinical workflows.</p>","PeriodicalId":74567,"journal":{"name":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","volume":"2025 ","pages":"668-673"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12369649/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/cbms65348.2025.00138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/4 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Hallucination in deep learning (DL) classification, where DL models yield confidently erroneous predictions remains a pressing concern. This study investigates whether binary classifiers are truly learning disease-specific features when distinguishing overlapping radiological presentations among pneumonia subtypes on chest X-ray (CXR) images. Specifically, we evaluate if uncertainty measure is a valuable tool in classifying signs of different pathogen-specific subtypes of pneumonia. We evaluated two binary classifiers to classify bacterial pneumonia and viral pneumonia, respectively, from normal CXRs. A third classifier explored the ability to distinguish bacterial from viral pneumonia presentation to highlight our concern regarding the observed hallucinations in the former cases. Our comprehensive analysis computes the Matthews Correlation Coefficient and prediction entropy metrics on a pediatric CXR dataset and reveals that the normal/bacterial and normal/viral classifiers consistently and confidently misclassify the unseen pneumonia subtype to their respective disease class. These findings expose a critical limitation concerning the tendency of binary classifiers to hallucinate by relying on general pneumonia indicators rather than pathogen-specific patterns, thereby challenging their utility in clinical workflows.

查看原文本刊更多论文

肺炎胸片二元分型中幻觉的潜在威胁。

深度学习（DL）分类中的幻觉，其中DL模型产生自信的错误预测仍然是一个紧迫的问题。本研究探讨了二元分类器在区分胸部x线（CXR）图像上肺炎亚型的重叠放射表现时是否真正了解疾病特异性特征。具体来说，我们评估不确定性测量是否是分类不同病原体特异性肺炎亚型体征的有价值的工具。我们评估了两种二元分类器，分别将细菌性肺炎和病毒性肺炎与正常cxr进行分类。第三种分类探讨了区分细菌性和病毒性肺炎表现的能力，以突出我们对前一种病例中观察到的幻觉的关注。我们的综合分析计算了儿童CXR数据集上的马修斯相关系数和预测熵指标，并揭示了正常/细菌和正常/病毒分类器一致且自信地将未见的肺炎亚型错误分类为各自的疾病类别。这些发现揭示了二元分类器依赖于一般肺炎指标而不是病原体特异性模式而产生幻觉的关键局限性，从而挑战了它们在临床工作流程中的实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. IEEE International Symposium on Computer-Based Medical Systems

自引率

0.00%

发文量