Looking Inside the Black-Box: Logic-based Explanations for Neural Networks

Proceedings of the Nineteenth International Conference on Principles of Knowledge Representation and Reasoning Pub Date : 2022-07-01 DOI:10.24963/kr.2022/45

João Ferreira, Manuel de Sousa Ribeiro, Ricardo Gonçalves, João Leite

{"title":"Looking Inside the Black-Box: Logic-based Explanations for Neural Networks","authors":"João Ferreira, Manuel de Sousa Ribeiro, Ricardo Gonçalves, João Leite","doi":"10.24963/kr.2022/45","DOIUrl":null,"url":null,"abstract":"Deep neural network-based methods have recently enjoyed great popularity due to their effectiveness in solving difficult tasks. Requiring minimal human effort, they have turned into an almost ubiquitous solution in multiple domains. However, due to the size and complexity of typical neural network models' architectures, as well as the sub-symbolical nature of the representations generated by their neuronal activations, neural networks are essentially opaque, making it nearly impossible to explain to humans the reasoning behind their decisions. We address this issue by developing a procedure to induce human-understandable logic-based theories that attempt to represent the classification process of a given neural network model, based on the idea of establishing mappings from the values of the activations produced by the neurons of that model to human-defined concepts to be used in the induced logic-based theory. Exploring the setting of a synthetic image classification task, we provide empirical results to assess the quality of the developed theories for different neural network models, compare them to existing theories on that task, and give evidence that the theories developed through our method are faithful to the representations learned by the neural networks that they are built to describe.","PeriodicalId":351970,"journal":{"name":"Proceedings of the Nineteenth International Conference on Principles of Knowledge Representation and Reasoning","volume":"86 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Nineteenth International Conference on Principles of Knowledge Representation and Reasoning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24963/kr.2022/45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

Deep neural network-based methods have recently enjoyed great popularity due to their effectiveness in solving difficult tasks. Requiring minimal human effort, they have turned into an almost ubiquitous solution in multiple domains. However, due to the size and complexity of typical neural network models' architectures, as well as the sub-symbolical nature of the representations generated by their neuronal activations, neural networks are essentially opaque, making it nearly impossible to explain to humans the reasoning behind their decisions. We address this issue by developing a procedure to induce human-understandable logic-based theories that attempt to represent the classification process of a given neural network model, based on the idea of establishing mappings from the values of the activations produced by the neurons of that model to human-defined concepts to be used in the induced logic-based theory. Exploring the setting of a synthetic image classification task, we provide empirical results to assess the quality of the developed theories for different neural network models, compare them to existing theories on that task, and give evidence that the theories developed through our method are faithful to the representations learned by the neural networks that they are built to describe.

查看原文本刊更多论文

黑盒子内部:基于逻辑的神经网络解释

基于深度神经网络的方法最近因其在解决困难任务方面的有效性而广受欢迎。只需最少的人力，它们已经成为多个领域中几乎无处不在的解决方案。然而，由于典型神经网络模型架构的大小和复杂性，以及它们的神经元激活所产生的表征的亚符号性质，神经网络本质上是不透明的，几乎不可能向人类解释其决策背后的推理。我们通过开发一个程序来诱导人类可理解的基于逻辑的理论来解决这个问题，这些理论试图表示给定神经网络模型的分类过程，基于建立从该模型的神经元产生的激活值到人类定义的概念的映射的想法，这些概念将用于基于逻辑的诱导理论。探索了一个合成图像分类任务的设置，我们提供了经验结果来评估不同神经网络模型的发展理论的质量，将它们与该任务的现有理论进行比较，并提供证据表明，通过我们的方法发展的理论是忠实于神经网络学习到的表征，它们被构建来描述。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Nineteenth International Conference on Principles of Knowledge Representation and Reasoning

自引率

0.00%

发文量