{"title":"对图像分类器的解释进行广泛评估","authors":"Suraja Poštić, Marko Subašić","doi":"10.1007/s00521-024-10273-4","DOIUrl":null,"url":null,"abstract":"<p>Saliency maps are input-resolution matrices used for visualizing local interpretations of image classifiers. Their pixel values reflect the importance of corresponding image locations for the model’s decision. Despite numerous proposals on how to obtain such maps, their evaluation remains an open question. This paper presents a carefully designed experimental procedure along with a set of quantitative interpretation evaluation metrics that rely solely on the original model behavior. Previously noticed evaluation biases have been attenuated by separating locations with high and low values, considering the full saliency map resolution, and using classifiers with diverse accuracies and all the classes in the dataset. We used the proposed evaluation metrics to compare and analyze seven well-known interpretation methods. Our experiments confirm the importance of object background as well as negative saliency map pixels, and we show that the scale of their impact on the model is comparable to that of positive ones. We also demonstrate that a good class score interpretation does not necessarily imply a good probability interpretation. DeepLIFT and LRP-<span>\\(\\epsilon\\)</span> methods proved most successful altogether, while Grad-CAM and Ablation-CAM performed very poorly, even in the detection of positive relevance. The retention of positive values alone in the latter two methods was responsible for the inaccurate detection of irrelevant locations as well.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Extensive evaluation of image classifiers’ interpretations\",\"authors\":\"Suraja Poštić, Marko Subašić\",\"doi\":\"10.1007/s00521-024-10273-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Saliency maps are input-resolution matrices used for visualizing local interpretations of image classifiers. Their pixel values reflect the importance of corresponding image locations for the model’s decision. Despite numerous proposals on how to obtain such maps, their evaluation remains an open question. This paper presents a carefully designed experimental procedure along with a set of quantitative interpretation evaluation metrics that rely solely on the original model behavior. Previously noticed evaluation biases have been attenuated by separating locations with high and low values, considering the full saliency map resolution, and using classifiers with diverse accuracies and all the classes in the dataset. We used the proposed evaluation metrics to compare and analyze seven well-known interpretation methods. Our experiments confirm the importance of object background as well as negative saliency map pixels, and we show that the scale of their impact on the model is comparable to that of positive ones. We also demonstrate that a good class score interpretation does not necessarily imply a good probability interpretation. DeepLIFT and LRP-<span>\\\\(\\\\epsilon\\\\)</span> methods proved most successful altogether, while Grad-CAM and Ablation-CAM performed very poorly, even in the detection of positive relevance. The retention of positive values alone in the latter two methods was responsible for the inaccurate detection of irrelevant locations as well.</p>\",\"PeriodicalId\":18925,\"journal\":{\"name\":\"Neural Computing and Applications\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Computing and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00521-024-10273-4\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computing and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00521-024-10273-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Extensive evaluation of image classifiers’ interpretations
Saliency maps are input-resolution matrices used for visualizing local interpretations of image classifiers. Their pixel values reflect the importance of corresponding image locations for the model’s decision. Despite numerous proposals on how to obtain such maps, their evaluation remains an open question. This paper presents a carefully designed experimental procedure along with a set of quantitative interpretation evaluation metrics that rely solely on the original model behavior. Previously noticed evaluation biases have been attenuated by separating locations with high and low values, considering the full saliency map resolution, and using classifiers with diverse accuracies and all the classes in the dataset. We used the proposed evaluation metrics to compare and analyze seven well-known interpretation methods. Our experiments confirm the importance of object background as well as negative saliency map pixels, and we show that the scale of their impact on the model is comparable to that of positive ones. We also demonstrate that a good class score interpretation does not necessarily imply a good probability interpretation. DeepLIFT and LRP-\(\epsilon\) methods proved most successful altogether, while Grad-CAM and Ablation-CAM performed very poorly, even in the detection of positive relevance. The retention of positive values alone in the latter two methods was responsible for the inaccurate detection of irrelevant locations as well.