Harrison Taylor, Liam Hiley, Jack Furby, A. Preece, Dave Braines
{"title":"VADR:辨析性多模态解释促进情境理解","authors":"Harrison Taylor, Liam Hiley, Jack Furby, A. Preece, Dave Braines","doi":"10.23919/FUSION45008.2020.9190215","DOIUrl":null,"url":null,"abstract":"The focus of this paper is on the generation of multimodal explanations for information fusion tasks performed on multimodal data. We propose that separating modal components in saliency map explanations provides users with a better understanding of how convolutional neural networks process multimodal data. We adapt established state-of-the-art explainability techniques to mid-level fusion networks in order to better understand (a) which modality of the input contributes most to a model's decision and (b) which parts of the input data are most relevant to that decision. Our method separates temporal from non-temporal information to allow a user to focus their attention on salient elements of the scene that are changing in multiple modalities. The work is experimentally tested on an activity recognition task using video and audio data. In view of the fact that explanations need to be tailored to the type of user in a User Fusion context, we focus on meeting explanation requirements for system creators and operators respectively.","PeriodicalId":419881,"journal":{"name":"2020 IEEE 23rd International Conference on Information Fusion (FUSION)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"VADR: Discriminative Multimodal Explanations for Situational Understanding\",\"authors\":\"Harrison Taylor, Liam Hiley, Jack Furby, A. Preece, Dave Braines\",\"doi\":\"10.23919/FUSION45008.2020.9190215\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The focus of this paper is on the generation of multimodal explanations for information fusion tasks performed on multimodal data. We propose that separating modal components in saliency map explanations provides users with a better understanding of how convolutional neural networks process multimodal data. We adapt established state-of-the-art explainability techniques to mid-level fusion networks in order to better understand (a) which modality of the input contributes most to a model's decision and (b) which parts of the input data are most relevant to that decision. Our method separates temporal from non-temporal information to allow a user to focus their attention on salient elements of the scene that are changing in multiple modalities. The work is experimentally tested on an activity recognition task using video and audio data. In view of the fact that explanations need to be tailored to the type of user in a User Fusion context, we focus on meeting explanation requirements for system creators and operators respectively.\",\"PeriodicalId\":419881,\"journal\":{\"name\":\"2020 IEEE 23rd International Conference on Information Fusion (FUSION)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 23rd International Conference on Information Fusion (FUSION)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/FUSION45008.2020.9190215\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 23rd International Conference on Information Fusion (FUSION)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/FUSION45008.2020.9190215","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
VADR: Discriminative Multimodal Explanations for Situational Understanding
The focus of this paper is on the generation of multimodal explanations for information fusion tasks performed on multimodal data. We propose that separating modal components in saliency map explanations provides users with a better understanding of how convolutional neural networks process multimodal data. We adapt established state-of-the-art explainability techniques to mid-level fusion networks in order to better understand (a) which modality of the input contributes most to a model's decision and (b) which parts of the input data are most relevant to that decision. Our method separates temporal from non-temporal information to allow a user to focus their attention on salient elements of the scene that are changing in multiple modalities. The work is experimentally tested on an activity recognition task using video and audio data. In view of the fact that explanations need to be tailored to the type of user in a User Fusion context, we focus on meeting explanation requirements for system creators and operators respectively.