João Ferreira, Manuel de Sousa Ribeiro, Ricardo Gonçalves, João Leite
{"title":"黑盒子内部:基于逻辑的神经网络解释","authors":"João Ferreira, Manuel de Sousa Ribeiro, Ricardo Gonçalves, João Leite","doi":"10.24963/kr.2022/45","DOIUrl":null,"url":null,"abstract":"Deep neural network-based methods have recently enjoyed great popularity due to their effectiveness in solving difficult tasks. Requiring minimal human effort, they have turned into an almost ubiquitous solution in multiple domains. However, due to the size and complexity of typical neural network models' architectures, as well as the sub-symbolical nature of the representations generated by their neuronal activations, neural networks are essentially opaque, making it nearly impossible to explain to humans the reasoning behind their decisions. We address this issue by developing a procedure to induce human-understandable logic-based theories that attempt to represent the classification process of a given neural network model, based on the idea of establishing mappings from the values of the activations produced by the neurons of that model to human-defined concepts to be used in the induced logic-based theory. Exploring the setting of a synthetic image classification task, we provide empirical results to assess the quality of the developed theories for different neural network models, compare them to existing theories on that task, and give evidence that the theories developed through our method are faithful to the representations learned by the neural networks that they are built to describe.","PeriodicalId":351970,"journal":{"name":"Proceedings of the Nineteenth International Conference on Principles of Knowledge Representation and Reasoning","volume":"86 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Looking Inside the Black-Box: Logic-based Explanations for Neural Networks\",\"authors\":\"João Ferreira, Manuel de Sousa Ribeiro, Ricardo Gonçalves, João Leite\",\"doi\":\"10.24963/kr.2022/45\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural network-based methods have recently enjoyed great popularity due to their effectiveness in solving difficult tasks. Requiring minimal human effort, they have turned into an almost ubiquitous solution in multiple domains. However, due to the size and complexity of typical neural network models' architectures, as well as the sub-symbolical nature of the representations generated by their neuronal activations, neural networks are essentially opaque, making it nearly impossible to explain to humans the reasoning behind their decisions. We address this issue by developing a procedure to induce human-understandable logic-based theories that attempt to represent the classification process of a given neural network model, based on the idea of establishing mappings from the values of the activations produced by the neurons of that model to human-defined concepts to be used in the induced logic-based theory. Exploring the setting of a synthetic image classification task, we provide empirical results to assess the quality of the developed theories for different neural network models, compare them to existing theories on that task, and give evidence that the theories developed through our method are faithful to the representations learned by the neural networks that they are built to describe.\",\"PeriodicalId\":351970,\"journal\":{\"name\":\"Proceedings of the Nineteenth International Conference on Principles of Knowledge Representation and Reasoning\",\"volume\":\"86 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Nineteenth International Conference on Principles of Knowledge Representation and Reasoning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.24963/kr.2022/45\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Nineteenth International Conference on Principles of Knowledge Representation and Reasoning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24963/kr.2022/45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Looking Inside the Black-Box: Logic-based Explanations for Neural Networks
Deep neural network-based methods have recently enjoyed great popularity due to their effectiveness in solving difficult tasks. Requiring minimal human effort, they have turned into an almost ubiquitous solution in multiple domains. However, due to the size and complexity of typical neural network models' architectures, as well as the sub-symbolical nature of the representations generated by their neuronal activations, neural networks are essentially opaque, making it nearly impossible to explain to humans the reasoning behind their decisions. We address this issue by developing a procedure to induce human-understandable logic-based theories that attempt to represent the classification process of a given neural network model, based on the idea of establishing mappings from the values of the activations produced by the neurons of that model to human-defined concepts to be used in the induced logic-based theory. Exploring the setting of a synthetic image classification task, we provide empirical results to assess the quality of the developed theories for different neural network models, compare them to existing theories on that task, and give evidence that the theories developed through our method are faithful to the representations learned by the neural networks that they are built to describe.