CausaLM: Causal Model Explanation Through Counterfactual Language Models

IF 5.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics Pub Date : 2020-05-27 DOI:10.1162/coli_a_00404

Amir Feder, Nadav Oved, Uri Shalit, Roi Reichart

{"title":"CausaLM: Causal Model Explanation Through Counterfactual Language Models","authors":"Amir Feder, Nadav Oved, Uri Shalit, Roi Reichart","doi":"10.1162/coli_a_00404","DOIUrl":null,"url":null,"abstract":"Abstract Understanding predictions made by deep neural networks is notoriously difficult, but also crucial to their dissemination. As all machine learning–based methods, they are as good as their training data, and can also capture unwanted biases. While there are tools that can help understand whether such biases exist, they do not distinguish between correlation and causation, and might be ill-suited for text-based models and for reasoning about high-level language concepts. A key problem of estimating the causal effect of a concept of interest on a given model is that this estimation requires the generation of counterfactual examples, which is challenging with existing generation technology. To bridge that gap, we propose CausaLM, a framework for producing causal model explanations using counterfactual language representation models. Our approach is based on fine-tuning of deep contextualized embedding models with auxiliary adversarial tasks derived from the causal graph of the problem. Concretely, we show that by carefully choosing auxiliary adversarial pre-training tasks, language representation models such as BERT can effectively learn a counterfactual representation for a given concept of interest, and be used to estimate its true causal effect on model performance. A byproduct of our method is a language representation model that is unaffected by the tested concept, which can be useful in mitigating unwanted bias ingrained in the data.1","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"47 1","pages":"333-386"},"PeriodicalIF":5.3000,"publicationDate":"2020-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"97","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Linguistics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/coli_a_00404","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 97

Abstract

Abstract Understanding predictions made by deep neural networks is notoriously difficult, but also crucial to their dissemination. As all machine learning–based methods, they are as good as their training data, and can also capture unwanted biases. While there are tools that can help understand whether such biases exist, they do not distinguish between correlation and causation, and might be ill-suited for text-based models and for reasoning about high-level language concepts. A key problem of estimating the causal effect of a concept of interest on a given model is that this estimation requires the generation of counterfactual examples, which is challenging with existing generation technology. To bridge that gap, we propose CausaLM, a framework for producing causal model explanations using counterfactual language representation models. Our approach is based on fine-tuning of deep contextualized embedding models with auxiliary adversarial tasks derived from the causal graph of the problem. Concretely, we show that by carefully choosing auxiliary adversarial pre-training tasks, language representation models such as BERT can effectively learn a counterfactual representation for a given concept of interest, and be used to estimate its true causal effect on model performance. A byproduct of our method is a language representation model that is unaffected by the tested concept, which can be useful in mitigating unwanted bias ingrained in the data.1

查看原文本刊更多论文

因果LM：通过反事实语言模型解释因果模型

摘要理解深度神经网络的预测是出了名的困难，但对其传播也至关重要。与所有基于机器学习的方法一样，它们与训练数据一样好，也可以捕捉不必要的偏见。虽然有一些工具可以帮助理解是否存在这种偏见，但它们无法区分相关性和因果关系，可能不适合基于文本的模型和对高级语言概念的推理。估计兴趣概念对给定模型的因果影响的一个关键问题是，这种估计需要生成反事实的例子，这对现有的生成技术来说是具有挑战性的。为了弥补这一差距，我们提出了CausaLM，这是一个使用反事实语言表示模型产生因果模型解释的框架。我们的方法基于对深度上下文嵌入模型的微调，该模型具有从问题的因果图派生的辅助对抗性任务。具体来说，我们表明，通过仔细选择辅助对抗性预训练任务，诸如BERT之类的语言表示模型可以有效地学习给定兴趣概念的反事实表示，并用于估计其对模型性能的真实因果影响。我们的方法的一个副产品是一个不受测试概念影响的语言表示模型，它可以帮助减轻数据中根深蒂固的不必要的偏见。1

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Linguistics 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Computational Linguistics, the longest-running publication dedicated solely to the computational and mathematical aspects of language and the design of natural language processing systems, provides university and industry linguists, computational linguists, AI and machine learning researchers, cognitive scientists, speech specialists, and philosophers with the latest insights into the computational aspects of language research.