基于参考和无参考指标的图像分类任务中AI - cnn的解释方法评价

Adv. Artif. Intell. Mach. Learn. Pub Date : 2022-12-02 DOI:10.54364/AAIML.2023.1143

A. Zhukov, J. Benois-Pineau, R. Giot

{"title":"基于参考和无参考指标的图像分类任务中AI - cnn的解释方法评价","authors":"A. Zhukov, J. Benois-Pineau, R. Giot","doi":"10.54364/AAIML.2023.1143","DOIUrl":null,"url":null,"abstract":"The most popular methods in AI-machine learning paradigm are mainly black boxes. This is why explanation of AI decisions is of emergency. Although dedicated explanation tools have been massively developed, the evaluation of their quality remains an open research question. In this paper, we generalize the methodologies of evaluation of post-hoc explainers of CNNs’ decisions in visual classification tasks with reference and no-reference based metrics. We apply them on our previously developed explainers (FEM1 , MLFEM), and popular Grad-CAM. The reference-based metrics are Pearson correlation coefficient and Similarity computed between the explanation map and its ground truth represented by a Gaze Fixation Density Map obtained with a psycho-visual experiment. As a no-reference metric, we use stability metric, proposed by Alvarez-Melis and Jaakkola. We study its behaviour, consensus with reference-based metrics and show that in case of several kinds of degradation on input images, this metric is in agreement with reference-based ones. Therefore, it can be used for evaluation of the quality of explainers when the ground truth is not available.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Evaluation of Explanation Methods of AI - CNNs in Image Classification Tasks with Reference-based and No-reference Metrics\",\"authors\":\"A. Zhukov, J. Benois-Pineau, R. Giot\",\"doi\":\"10.54364/AAIML.2023.1143\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The most popular methods in AI-machine learning paradigm are mainly black boxes. This is why explanation of AI decisions is of emergency. Although dedicated explanation tools have been massively developed, the evaluation of their quality remains an open research question. In this paper, we generalize the methodologies of evaluation of post-hoc explainers of CNNs’ decisions in visual classification tasks with reference and no-reference based metrics. We apply them on our previously developed explainers (FEM1 , MLFEM), and popular Grad-CAM. The reference-based metrics are Pearson correlation coefficient and Similarity computed between the explanation map and its ground truth represented by a Gaze Fixation Density Map obtained with a psycho-visual experiment. As a no-reference metric, we use stability metric, proposed by Alvarez-Melis and Jaakkola. We study its behaviour, consensus with reference-based metrics and show that in case of several kinds of degradation on input images, this metric is in agreement with reference-based ones. Therefore, it can be used for evaluation of the quality of explainers when the ground truth is not available.\",\"PeriodicalId\":373878,\"journal\":{\"name\":\"Adv. Artif. Intell. Mach. Learn.\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Adv. Artif. Intell. Mach. Learn.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.54364/AAIML.2023.1143\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adv. Artif. Intell. Mach. Learn.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54364/AAIML.2023.1143","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

人工智能-机器学习范式中最流行的方法主要是黑盒。这就是为什么解释人工智能决策是紧急的。尽管专门的解释工具已经大量开发，但对其质量的评估仍然是一个开放的研究问题。在本文中，我们用参考和无参考指标概括了cnn在视觉分类任务中决策的事后解释器的评估方法。我们将它们应用于我们以前开发的解释器(FEM1, MLFEM)和流行的Grad-CAM。基于参考的度量是通过心理视觉实验得到的凝视密度图表示的解释图与其基础真值之间的Pearson相关系数和相似度计算。作为无参考度量，我们使用由Alvarez-Melis和Jaakkola提出的稳定性度量。我们研究了它的行为，与基于参考的指标的一致性，并表明在输入图像的几种退化情况下，该指标与基于参考的指标一致。因此，它可以用来评价解释者的质量，当基础真理是不可用的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluation of Explanation Methods of AI - CNNs in Image Classification Tasks with Reference-based and No-reference Metrics

The most popular methods in AI-machine learning paradigm are mainly black boxes. This is why explanation of AI decisions is of emergency. Although dedicated explanation tools have been massively developed, the evaluation of their quality remains an open research question. In this paper, we generalize the methodologies of evaluation of post-hoc explainers of CNNs’ decisions in visual classification tasks with reference and no-reference based metrics. We apply them on our previously developed explainers (FEM1 , MLFEM), and popular Grad-CAM. The reference-based metrics are Pearson correlation coefficient and Similarity computed between the explanation map and its ground truth represented by a Gaze Fixation Density Map obtained with a psycho-visual experiment. As a no-reference metric, we use stability metric, proposed by Alvarez-Melis and Jaakkola. We study its behaviour, consensus with reference-based metrics and show that in case of several kinds of degradation on input images, this metric is in agreement with reference-based ones. Therefore, it can be used for evaluation of the quality of explainers when the ground truth is not available.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Adv. Artif. Intell. Mach. Learn.

自引率

0.00%

发文量