RL-CAM: Visual Explanations for Convolutional Networks using Reinforcement Learning

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI:10.1109/CVPRW59228.2023.00400

S. Sarkar, Ashwin Ramesh Babu, Sajad Mousavi, Sahand Ghorbanpour, Vineet Gundecha, Antonio Guillen, Ricardo Luna, Avisek Naug

{"title":"RL-CAM: Visual Explanations for Convolutional Networks using Reinforcement Learning","authors":"S. Sarkar, Ashwin Ramesh Babu, Sajad Mousavi, Sahand Ghorbanpour, Vineet Gundecha, Antonio Guillen, Ricardo Luna, Avisek Naug","doi":"10.1109/CVPRW59228.2023.00400","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks (CNNs) are state-of-the-art models for computer vision tasks such as image classification, object detection, and segmentation. However, these models suffer from their inability to explain decisions, particularly in fields like healthcare and security, where interpretability is critical. Previous research has developed various methods for interpreting CNNs, including visualization-based approaches (e.g., saliency maps) that aim to reveal the underlying features used by the model to make predictions. In this work, we propose a novel approach that uses reinforcement learning to generate a visual explanation for CNNs. Our method considers the black-box CNN model and relies solely on the probability distribution of the model’s output to localize the features contributing to a particular prediction. The proposed reinforcement learning algorithm has an agent with two actions, a forward action that explores the input image and identifies the most sensitive region to generate a localization mask, and a reverse action that fine-tunes the localization mask. We evaluate the performance of our approach using multiple image segmentation metrics and compare it with existing visualization-based methods. The experimental results demonstrate that our proposed method outperforms the existing techniques, producing more accurate localization masks of regions of interest in the input images.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPRW59228.2023.00400","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Convolutional Neural Networks (CNNs) are state-of-the-art models for computer vision tasks such as image classification, object detection, and segmentation. However, these models suffer from their inability to explain decisions, particularly in fields like healthcare and security, where interpretability is critical. Previous research has developed various methods for interpreting CNNs, including visualization-based approaches (e.g., saliency maps) that aim to reveal the underlying features used by the model to make predictions. In this work, we propose a novel approach that uses reinforcement learning to generate a visual explanation for CNNs. Our method considers the black-box CNN model and relies solely on the probability distribution of the model’s output to localize the features contributing to a particular prediction. The proposed reinforcement learning algorithm has an agent with two actions, a forward action that explores the input image and identifies the most sensitive region to generate a localization mask, and a reverse action that fine-tunes the localization mask. We evaluate the performance of our approach using multiple image segmentation metrics and compare it with existing visualization-based methods. The experimental results demonstrate that our proposed method outperforms the existing techniques, producing more accurate localization masks of regions of interest in the input images.

查看原文本刊更多论文

RL-CAM:使用强化学习的卷积网络的可视化解释

卷积神经网络(cnn)是计算机视觉任务的最先进模型，如图像分类、目标检测和分割。然而，这些模型无法解释决策，特别是在医疗保健和安全等领域，可解释性至关重要。以前的研究已经开发了各种解释cnn的方法，包括基于可视化的方法(例如，显著性图)，旨在揭示模型用于进行预测的潜在特征。在这项工作中，我们提出了一种新的方法，使用强化学习来生成cnn的视觉解释。我们的方法考虑黑盒CNN模型，并仅依赖于模型输出的概率分布来定位有助于特定预测的特征。所提出的强化学习算法有一个具有两个动作的代理，一个是探索输入图像并识别最敏感区域以生成定位掩码的正向动作，一个是微调定位掩码的反向动作。我们使用多个图像分割指标来评估我们的方法的性能，并将其与现有的基于可视化的方法进行比较。实验结果表明，我们提出的方法优于现有的技术，在输入图像中产生更准确的感兴趣区域的定位掩码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

自引率

0.00%

发文量