可视化地解释3D-CNN预测视频分类与自适应遮挡敏感性分析

Tomoki Uchiyama, Naoya Sogi, Koichiro Niinuma, Kazuhiro Fukui
{"title":"可视化地解释3D-CNN预测视频分类与自适应遮挡敏感性分析","authors":"Tomoki Uchiyama, Naoya Sogi, Koichiro Niinuma, Kazuhiro Fukui","doi":"10.1109/wacv56688.2023.00156","DOIUrl":null,"url":null,"abstract":"This paper proposes a method for visually explaining the decision-making process of 3D convolutional neural networks (CNN) with a temporal extension of occlusion sensitivity analysis. The key idea here is to occlude a specific volume of data by a 3D mask in an input 3D temporalspatial data space and then measure the change degree in the output score. The occluded volume data that produces a larger change degree is regarded as a more critical element for classification. However, while the occlusion sensitivity analysis is commonly used to analyze single image classification, it is not so straightforward to apply this idea to video classification as a simple fixed cuboid cannot deal with the motions. To this end, we adapt the shape of a 3D occlusion mask to complicated motions of target objects. Our flexible mask adaptation is performed by considering the temporal continuity and spatial co-occurrence of the optical flows extracted from the input video data. We further propose to approximate our method by using the first-order partial derivative of the score with respect to an input image to reduce its computational cost. We demonstrate the effectiveness of our method through various and extensive comparisons with the conventional methods in terms of the deletion/insertion metric and the pointing metric on the UCF101. The code is available at: https://github.com/uchiyama33/AOSA.","PeriodicalId":497882,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Visually explaining 3D-CNN predictions for video classification with an adaptive occlusion sensitivity analysis\",\"authors\":\"Tomoki Uchiyama, Naoya Sogi, Koichiro Niinuma, Kazuhiro Fukui\",\"doi\":\"10.1109/wacv56688.2023.00156\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a method for visually explaining the decision-making process of 3D convolutional neural networks (CNN) with a temporal extension of occlusion sensitivity analysis. The key idea here is to occlude a specific volume of data by a 3D mask in an input 3D temporalspatial data space and then measure the change degree in the output score. The occluded volume data that produces a larger change degree is regarded as a more critical element for classification. However, while the occlusion sensitivity analysis is commonly used to analyze single image classification, it is not so straightforward to apply this idea to video classification as a simple fixed cuboid cannot deal with the motions. To this end, we adapt the shape of a 3D occlusion mask to complicated motions of target objects. Our flexible mask adaptation is performed by considering the temporal continuity and spatial co-occurrence of the optical flows extracted from the input video data. We further propose to approximate our method by using the first-order partial derivative of the score with respect to an input image to reduce its computational cost. We demonstrate the effectiveness of our method through various and extensive comparisons with the conventional methods in terms of the deletion/insertion metric and the pointing metric on the UCF101. The code is available at: https://github.com/uchiyama33/AOSA.\",\"PeriodicalId\":497882,\"journal\":{\"name\":\"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)\",\"volume\":\"98 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/wacv56688.2023.00156\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/wacv56688.2023.00156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

本文提出了一种基于遮挡敏感性分析的时间扩展的三维卷积神经网络(CNN)决策过程可视化解释方法。这里的关键思想是在输入的3D时空数据空间中用3D掩码遮挡特定的数据量,然后测量输出分数的变化程度。产生较大变化程度的被遮挡体数据被视为更关键的分类要素。然而,虽然遮挡敏感性分析通常用于分析单幅图像分类,但将该思想应用于视频分类并不是那么简单,因为简单的固定长方体无法处理运动。为此,我们调整了三维遮挡遮罩的形状,以适应目标物体的复杂运动。我们通过考虑从输入视频数据中提取的光流的时间连续性和空间共现性来实现灵活的掩膜自适应。我们进一步建议通过使用分数相对于输入图像的一阶偏导数来近似我们的方法,以减少其计算成本。我们通过在UCF101上的删除/插入度量和指向度量方面与传统方法进行各种广泛的比较,证明了我们的方法的有效性。代码可从https://github.com/uchiyama33/AOSA获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Visually explaining 3D-CNN predictions for video classification with an adaptive occlusion sensitivity analysis
This paper proposes a method for visually explaining the decision-making process of 3D convolutional neural networks (CNN) with a temporal extension of occlusion sensitivity analysis. The key idea here is to occlude a specific volume of data by a 3D mask in an input 3D temporalspatial data space and then measure the change degree in the output score. The occluded volume data that produces a larger change degree is regarded as a more critical element for classification. However, while the occlusion sensitivity analysis is commonly used to analyze single image classification, it is not so straightforward to apply this idea to video classification as a simple fixed cuboid cannot deal with the motions. To this end, we adapt the shape of a 3D occlusion mask to complicated motions of target objects. Our flexible mask adaptation is performed by considering the temporal continuity and spatial co-occurrence of the optical flows extracted from the input video data. We further propose to approximate our method by using the first-order partial derivative of the score with respect to an input image to reduce its computational cost. We demonstrate the effectiveness of our method through various and extensive comparisons with the conventional methods in terms of the deletion/insertion metric and the pointing metric on the UCF101. The code is available at: https://github.com/uchiyama33/AOSA.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信