Interpretability Analysis for Named Entity Recognition to Understand System Predictions and How They Can Improve

IF 3.7 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics Pub Date : 2020-04-09 DOI:10.1162/coli_a_00397

Oshin Agarwal, Yinfei Yang, Byron C. Wallace, A. Nenkova

{"title":"Interpretability Analysis for Named Entity Recognition to Understand System Predictions and How They Can Improve","authors":"Oshin Agarwal, Yinfei Yang, Byron C. Wallace, A. Nenkova","doi":"10.1162/coli_a_00397","DOIUrl":null,"url":null,"abstract":"Abstract Named entity recognition systems achieve remarkable performance on domains such as English news. It is natural to ask: What are these models actually learning to achieve this? Are they merely memorizing the names themselves? Or are they capable of interpreting the text and inferring the correct entity type from the linguistic context? We examine these questions by contrasting the performance of several variants of architectures for named entity recognition, with some provided only representations of the context as features. We experiment with GloVe-based BiLSTM-CRF as well as BERT. We find that context does influence predictions, but the main factor driving high performance is learning the named tokens themselves. Furthermore, we find that BERT is not always better at recognizing predictive contexts compared to a BiLSTM-CRF model. We enlist human annotators to evaluate the feasibility of inferring entity types from context alone and find that humans are also mostly unable to infer entity types for the majority of examples on which the context-only system made errors. However, there is room for improvement: A system should be able to recognize any named entity in a predictive context correctly and our experiments indicate that current systems may be improved by such capability. Our human study also revealed that systems and humans do not always learn the same contextual clues, and context-only systems are sometimes correct even when humans fail to recognize the entity type from the context. Finally, we find that one issue contributing to model errors is the use of “entangled” representations that encode both contextual and local token information into a single vector, which can obscure clues. Our results suggest that designing models that explicitly operate over representations of local inputs and context, respectively, may in some cases improve performance. In light of these and related findings, we highlight directions for future work.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"47 1","pages":"117-140"},"PeriodicalIF":3.7000,"publicationDate":"2020-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Linguistics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/coli_a_00397","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 28

Abstract

Abstract Named entity recognition systems achieve remarkable performance on domains such as English news. It is natural to ask: What are these models actually learning to achieve this? Are they merely memorizing the names themselves? Or are they capable of interpreting the text and inferring the correct entity type from the linguistic context? We examine these questions by contrasting the performance of several variants of architectures for named entity recognition, with some provided only representations of the context as features. We experiment with GloVe-based BiLSTM-CRF as well as BERT. We find that context does influence predictions, but the main factor driving high performance is learning the named tokens themselves. Furthermore, we find that BERT is not always better at recognizing predictive contexts compared to a BiLSTM-CRF model. We enlist human annotators to evaluate the feasibility of inferring entity types from context alone and find that humans are also mostly unable to infer entity types for the majority of examples on which the context-only system made errors. However, there is room for improvement: A system should be able to recognize any named entity in a predictive context correctly and our experiments indicate that current systems may be improved by such capability. Our human study also revealed that systems and humans do not always learn the same contextual clues, and context-only systems are sometimes correct even when humans fail to recognize the entity type from the context. Finally, we find that one issue contributing to model errors is the use of “entangled” representations that encode both contextual and local token information into a single vector, which can obscure clues. Our results suggest that designing models that explicitly operate over representations of local inputs and context, respectively, may in some cases improve performance. In light of these and related findings, we highlight directions for future work.

查看原文本刊更多论文

命名实体识别的可解释性分析，以理解系统预测以及如何改进

摘要命名实体识别系统在英语新闻等领域取得了显著的性能。很自然地会问：这些模型实际上是在学习什么来实现这一点的？他们只是自己记名字吗？或者他们能够解释文本并从语言语境中推断出正确的实体类型吗？我们通过对比命名实体识别的几种架构变体的性能来研究这些问题，其中一些仅提供了上下文的表示作为特征。我们用基于GloVe的BiLSTM CRF和BERT进行了实验。我们发现，上下文确实会影响预测，但推动高性能的主要因素是学习命名的令牌本身。此外，我们发现，与BiLSTM CRF模型相比，BERT在识别预测上下文方面并不总是更好。我们招募了人工注释器来评估仅从上下文推断实体类型的可行性，并发现对于仅上下文系统出错的大多数示例，人工也大多无法推断实体类型。然而，还有改进的空间：一个系统应该能够正确识别预测上下文中的任何命名实体，我们的实验表明，当前的系统可以通过这种能力得到改进。我们的人类研究还表明，系统和人类并不总是学习相同的上下文线索，即使人类无法从上下文中识别实体类型，仅上下文的系统有时也是正确的。最后，我们发现导致模型错误的一个问题是使用“纠缠”表示，将上下文和局部标记信息编码到单个向量中，这可能会模糊线索。我们的结果表明，设计分别在局部输入和上下文表示上显式操作的模型，在某些情况下可能会提高性能。根据这些和相关发现，我们强调了未来工作的方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Linguistics 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Computational Linguistics, the longest-running publication dedicated solely to the computational and mathematical aspects of language and the design of natural language processing systems, provides university and industry linguists, computational linguists, AI and machine learning researchers, cognitive scientists, speech specialists, and philosophers with the latest insights into the computational aspects of language research.