t2 -NER:一种基于两阶段跨度的模板统一命名实体识别框架

IF 4.2 1区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Transactions of the Association for Computational Linguistics Pub Date : 2023-01-01 DOI:10.1162/tacl_a_00602

Peixin Huang, Xiang Zhao, Minghao Hu, Zhen Tan, Weidong Xiao

{"title":"t2 -NER:一种基于两阶段跨度的模板统一命名实体识别框架","authors":"Peixin Huang, Xiang Zhao, Minghao Hu, Zhen Tan, Weidong Xiao","doi":"10.1162/tacl_a_00602","DOIUrl":null,"url":null,"abstract":"Abstract Named Entity Recognition (NER) has so far evolved from the traditional flat NER to overlapped and discontinuous NER. They have mostly been solved separately, with only several exceptions that concurrently tackle three tasks with a single model. The current best-performing method formalizes the unified NER as word-word relation classification, which barely focuses on mention content learning and fails to detect entity mentions comprising a single word. In this paper, we propose a two-stage span-based framework with templates, namely, T2-NER, to resolve the unified NER task. The first stage is to extract entity spans, where flat and overlapped entities can be recognized. The second stage is to classify over all entity span pairs, where discontinuous entities can be recognized. Finally, multi-task learning is used to jointly train two stages. To improve the efficiency of span-based model, we design grouped templates and typed templates for two stages to realize batch computations. We also apply an adjacent packing strategy and a latter packing strategy to model discriminative boundary information and learn better span (pair) representation. Moreover, we introduce the syntax information to enhance our span representation. We perform extensive experiments on eight benchmark datasets for flat, overlapped, and discontinuous NER, where our model beats all the current competitive baselines, obtaining the best performance of unified NER.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"5 1","pages":"0"},"PeriodicalIF":4.2000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"T 2 -NER: A Two-Stage Span-Based Framework for Unified Named Entity Recognition with Templates\",\"authors\":\"Peixin Huang, Xiang Zhao, Minghao Hu, Zhen Tan, Weidong Xiao\",\"doi\":\"10.1162/tacl_a_00602\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Named Entity Recognition (NER) has so far evolved from the traditional flat NER to overlapped and discontinuous NER. They have mostly been solved separately, with only several exceptions that concurrently tackle three tasks with a single model. The current best-performing method formalizes the unified NER as word-word relation classification, which barely focuses on mention content learning and fails to detect entity mentions comprising a single word. In this paper, we propose a two-stage span-based framework with templates, namely, T2-NER, to resolve the unified NER task. The first stage is to extract entity spans, where flat and overlapped entities can be recognized. The second stage is to classify over all entity span pairs, where discontinuous entities can be recognized. Finally, multi-task learning is used to jointly train two stages. To improve the efficiency of span-based model, we design grouped templates and typed templates for two stages to realize batch computations. We also apply an adjacent packing strategy and a latter packing strategy to model discriminative boundary information and learn better span (pair) representation. Moreover, we introduce the syntax information to enhance our span representation. We perform extensive experiments on eight benchmark datasets for flat, overlapped, and discontinuous NER, where our model beats all the current competitive baselines, obtaining the best performance of unified NER.\",\"PeriodicalId\":33559,\"journal\":{\"name\":\"Transactions of the Association for Computational Linguistics\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transactions of the Association for Computational Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1162/tacl_a_00602\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1162/tacl_a_00602","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

命名实体识别(NER)已经从传统的平面NER发展到重叠的不连续NER。它们大多是单独解决的，只有几个例外情况是用一个模型同时处理三个任务。目前表现最好的方法将统一的NER形式化为词-词关系分类，它几乎不关注提及内容学习，无法检测包含单个词的实体提及。在本文中，我们提出了一个两阶段的基于跨度的模板框架，即T2-NER来解决统一的NER任务。第一阶段是提取实体跨度，其中可以识别平面和重叠的实体。第二阶段是对所有实体跨度对进行分类，其中可以识别不连续实体。最后，采用多任务学习对两个阶段进行联合训练。为了提高基于跨度的模型的效率，我们设计了分组模板和类型化模板两个阶段来实现批量计算。我们还采用相邻填充策略和后一种填充策略来建模判别边界信息，并学习更好的跨(对)表示。此外，我们还引入了语法信息来增强我们的跨度表示。我们在8个基准数据集上对平坦、重叠和不连续的NER进行了广泛的实验，在这些数据集上，我们的模型击败了所有当前竞争的基线，获得了统一NER的最佳性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

T 2 -NER: A Two-Stage Span-Based Framework for Unified Named Entity Recognition with Templates

Abstract Named Entity Recognition (NER) has so far evolved from the traditional flat NER to overlapped and discontinuous NER. They have mostly been solved separately, with only several exceptions that concurrently tackle three tasks with a single model. The current best-performing method formalizes the unified NER as word-word relation classification, which barely focuses on mention content learning and fails to detect entity mentions comprising a single word. In this paper, we propose a two-stage span-based framework with templates, namely, T2-NER, to resolve the unified NER task. The first stage is to extract entity spans, where flat and overlapped entities can be recognized. The second stage is to classify over all entity span pairs, where discontinuous entities can be recognized. Finally, multi-task learning is used to jointly train two stages. To improve the efficiency of span-based model, we design grouped templates and typed templates for two stages to realize batch computations. We also apply an adjacent packing strategy and a latter packing strategy to model discriminative boundary information and learn better span (pair) representation. Moreover, we introduce the syntax information to enhance our span representation. We perform extensive experiments on eight benchmark datasets for flat, overlapped, and discontinuous NER, where our model beats all the current competitive baselines, obtaining the best performance of unified NER.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Transactions of the Association for Computational Linguistics Multiple-

CiteScore

32.60

自引率

4.60%

发文量

审稿时长

8 weeks

期刊介绍： The highly regarded quarterly journal Computational Linguistics has a companion journal called Transactions of the Association for Computational Linguistics. This open access journal publishes articles in all areas of natural language processing and is an important resource for academic and industry computational linguists, natural language processing experts, artificial intelligence and machine learning investigators, cognitive scientists, speech specialists, as well as linguists and philosophers. The journal disseminates work of vital relevance to these professionals on an annual basis.