Failure Detection in Deep Neural Networks for Medical Imaging.

Frontiers in Medical Technology Pub Date : 2022-07-22 eCollection Date: 2022-01-01 DOI:10.3389/fmedt.2022.919046

Sabeen Ahmed, Dimah Dera, Saud Ul Hassan, Nidhal Bouaynaya, Ghulam Rasool

{"title":"Failure Detection in Deep Neural Networks for Medical Imaging.","authors":"Sabeen Ahmed, Dimah Dera, Saud Ul Hassan, Nidhal Bouaynaya, Ghulam Rasool","doi":"10.3389/fmedt.2022.919046","DOIUrl":null,"url":null,"abstract":"<p><p>Deep neural networks (DNNs) have started to find their role in the modern healthcare system. DNNs are being developed for diagnosis, prognosis, treatment planning, and outcome prediction for various diseases. With the increasing number of applications of DNNs in modern healthcare, their trustworthiness and reliability are becoming increasingly important. An essential aspect of trustworthiness is detecting the performance degradation and failure of deployed DNNs in medical settings. The softmax output values produced by DNNs are not a calibrated measure of model confidence. Softmax probability numbers are generally higher than the actual model confidence. The model confidence-accuracy gap further increases for wrong predictions and noisy inputs. We employ recently proposed Bayesian deep neural networks (BDNNs) to learn uncertainty in the model parameters. These models simultaneously output the predictions and a measure of confidence in the predictions. By testing these models under various noisy conditions, we show that the (learned) predictive confidence is well calibrated. We use these reliable confidence values for monitoring performance degradation and failure detection in DNNs. We propose two different failure detection methods. In the first method, we define a fixed threshold value based on the behavior of the predictive confidence with changing signal-to-noise ratio (SNR) of the test dataset. The second method learns the threshold value with a neural network. The proposed failure detection mechanisms seamlessly abstain from making decisions when the confidence of the BDNN is below the defined threshold and hold the decision for manual review. Resultantly, the accuracy of the models improves on the unseen test samples. We tested our proposed approach on three medical imaging datasets: PathMNIST, DermaMNIST, and OrganAMNIST, under different levels and types of noise. An increase in the noise of the test images increases the number of abstained samples. BDNNs are inherently robust and show more than 10% accuracy improvement with the proposed failure detection methods. The increased number of abstained samples or an abrupt increase in the predictive variance indicates model performance degradation or possible failure. Our work has the potential to improve the trustworthiness of DNNs and enhance user confidence in the model predictions.</p>","PeriodicalId":12599,"journal":{"name":"Frontiers in Medical Technology","volume":" ","pages":"919046"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9359318/pdf/","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Medical Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fmedt.2022.919046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Deep neural networks (DNNs) have started to find their role in the modern healthcare system. DNNs are being developed for diagnosis, prognosis, treatment planning, and outcome prediction for various diseases. With the increasing number of applications of DNNs in modern healthcare, their trustworthiness and reliability are becoming increasingly important. An essential aspect of trustworthiness is detecting the performance degradation and failure of deployed DNNs in medical settings. The softmax output values produced by DNNs are not a calibrated measure of model confidence. Softmax probability numbers are generally higher than the actual model confidence. The model confidence-accuracy gap further increases for wrong predictions and noisy inputs. We employ recently proposed Bayesian deep neural networks (BDNNs) to learn uncertainty in the model parameters. These models simultaneously output the predictions and a measure of confidence in the predictions. By testing these models under various noisy conditions, we show that the (learned) predictive confidence is well calibrated. We use these reliable confidence values for monitoring performance degradation and failure detection in DNNs. We propose two different failure detection methods. In the first method, we define a fixed threshold value based on the behavior of the predictive confidence with changing signal-to-noise ratio (SNR) of the test dataset. The second method learns the threshold value with a neural network. The proposed failure detection mechanisms seamlessly abstain from making decisions when the confidence of the BDNN is below the defined threshold and hold the decision for manual review. Resultantly, the accuracy of the models improves on the unseen test samples. We tested our proposed approach on three medical imaging datasets: PathMNIST, DermaMNIST, and OrganAMNIST, under different levels and types of noise. An increase in the noise of the test images increases the number of abstained samples. BDNNs are inherently robust and show more than 10% accuracy improvement with the proposed failure detection methods. The increased number of abstained samples or an abrupt increase in the predictive variance indicates model performance degradation or possible failure. Our work has the potential to improve the trustworthiness of DNNs and enhance user confidence in the model predictions.

Abstract Image

查看原文本刊更多论文

用于医学成像的深度神经网络故障检测。

深度神经网络(dnn)已经开始在现代医疗保健系统中发挥作用。dnn正在被开发用于各种疾病的诊断、预后、治疗计划和结果预测。随着深度神经网络在现代医疗保健中的应用越来越多，其可信度和可靠性变得越来越重要。可信度的一个重要方面是检测在医疗环境中部署的dnn的性能下降和故障。dnn产生的softmax输出值不是模型置信度的校准度量。Softmax概率数通常高于实际模型置信度。对于错误的预测和有噪声的输入，模型的置信度和准确度差距进一步增加。我们使用最近提出的贝叶斯深度神经网络(bdnn)来学习模型参数中的不确定性。这些模型同时输出预测和对预测的信心度量。通过在各种噪声条件下测试这些模型，我们表明(学习的)预测置信度是很好的校准。我们使用这些可靠的置信度值来监测dnn的性能退化和故障检测。我们提出了两种不同的故障检测方法。在第一种方法中，我们根据预测置信度随测试数据集信噪比(SNR)变化的行为定义了一个固定的阈值。第二种方法是利用神经网络学习阈值。当BDNN的置信度低于定义的阈值时，所提出的故障检测机制无缝地放弃了决策，并将决策保留给人工审查。结果表明，在未见过的测试样本上，模型的准确性得到了提高。我们在三个医学成像数据集:PathMNIST、DermaMNIST和OrganAMNIST上测试了我们提出的方法，在不同水平和类型的噪声下。测试图像噪声的增加增加了弃权样本的数量。bdnn具有固有的鲁棒性，采用所提出的故障检测方法，准确率提高了10%以上。弃权样本数量的增加或预测方差的突然增加表明模型性能下降或可能失效。我们的工作有可能提高dnn的可信度，增强用户对模型预测的信心。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Frontiers in Medical Technology

自引率

0.00%

发文量