The Hidden Threat of Hallucinations in Binary Chest X-ray Pneumonia Classification.

Sivaramakrishnan Rajaraman, Zhaohui Liang, Niccolo Marini, Zhiyun Xue, Sameer Antani
{"title":"The Hidden Threat of Hallucinations in Binary Chest X-ray Pneumonia Classification.","authors":"Sivaramakrishnan Rajaraman, Zhaohui Liang, Niccolo Marini, Zhiyun Xue, Sameer Antani","doi":"10.1109/cbms65348.2025.00138","DOIUrl":null,"url":null,"abstract":"<p><p>Hallucination in deep learning (DL) classification, where DL models yield confidently erroneous predictions remains a pressing concern. This study investigates whether binary classifiers are truly learning disease-specific features when distinguishing overlapping radiological presentations among pneumonia subtypes on chest X-ray (CXR) images. Specifically, we evaluate if uncertainty measure is a valuable tool in classifying signs of different pathogen-specific subtypes of pneumonia. We evaluated two binary classifiers to classify bacterial pneumonia and viral pneumonia, respectively, from normal CXRs. A third classifier explored the ability to distinguish bacterial from viral pneumonia presentation to highlight our concern regarding the observed hallucinations in the former cases. Our comprehensive analysis computes the Matthews Correlation Coefficient and prediction entropy metrics on a pediatric CXR dataset and reveals that the normal/bacterial and normal/viral classifiers consistently and confidently misclassify the unseen pneumonia subtype to their respective disease class. These findings expose a critical limitation concerning the tendency of binary classifiers to hallucinate by relying on general pneumonia indicators rather than pathogen-specific patterns, thereby challenging their utility in clinical workflows.</p>","PeriodicalId":74567,"journal":{"name":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","volume":"2025 ","pages":"668-673"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12369649/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/cbms65348.2025.00138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/4 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Hallucination in deep learning (DL) classification, where DL models yield confidently erroneous predictions remains a pressing concern. This study investigates whether binary classifiers are truly learning disease-specific features when distinguishing overlapping radiological presentations among pneumonia subtypes on chest X-ray (CXR) images. Specifically, we evaluate if uncertainty measure is a valuable tool in classifying signs of different pathogen-specific subtypes of pneumonia. We evaluated two binary classifiers to classify bacterial pneumonia and viral pneumonia, respectively, from normal CXRs. A third classifier explored the ability to distinguish bacterial from viral pneumonia presentation to highlight our concern regarding the observed hallucinations in the former cases. Our comprehensive analysis computes the Matthews Correlation Coefficient and prediction entropy metrics on a pediatric CXR dataset and reveals that the normal/bacterial and normal/viral classifiers consistently and confidently misclassify the unseen pneumonia subtype to their respective disease class. These findings expose a critical limitation concerning the tendency of binary classifiers to hallucinate by relying on general pneumonia indicators rather than pathogen-specific patterns, thereby challenging their utility in clinical workflows.

肺炎胸片二元分型中幻觉的潜在威胁。
深度学习(DL)分类中的幻觉,其中DL模型产生自信的错误预测仍然是一个紧迫的问题。本研究探讨了二元分类器在区分胸部x线(CXR)图像上肺炎亚型的重叠放射表现时是否真正了解疾病特异性特征。具体来说,我们评估不确定性测量是否是分类不同病原体特异性肺炎亚型体征的有价值的工具。我们评估了两种二元分类器,分别将细菌性肺炎和病毒性肺炎与正常cxr进行分类。第三种分类探讨了区分细菌性和病毒性肺炎表现的能力,以突出我们对前一种病例中观察到的幻觉的关注。我们的综合分析计算了儿童CXR数据集上的马修斯相关系数和预测熵指标,并揭示了正常/细菌和正常/病毒分类器一致且自信地将未见的肺炎亚型错误分类为各自的疾病类别。这些发现揭示了二元分类器依赖于一般肺炎指标而不是病原体特异性模式而产生幻觉的关键局限性,从而挑战了它们在临床工作流程中的实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信