USTC-NELSLIP at SemEval-2023 Task 2: Statistical Construction and Dual Adaptation of Gazetteer for Multilingual Complex NER

Jun Ma, Jia-Chen Gu, Jiajun Qi, Zhen-Hua Ling, Quan Liu, Xiaoyi Zhao
{"title":"USTC-NELSLIP at SemEval-2023 Task 2: Statistical Construction and Dual Adaptation of Gazetteer for Multilingual Complex NER","authors":"Jun Ma, Jia-Chen Gu, Jiajun Qi, Zhen-Hua Ling, Quan Liu, Xiaoyi Zhao","doi":"10.48550/arXiv.2305.02517","DOIUrl":null,"url":null,"abstract":"This paper describes the system developed by the USTC-NELSLIP team for SemEval-2023 Task 2 Multilingual Complex Named Entity Recognition (MultiCoNER II). We propose a method named Statistical Construction and Dual Adaptation of Gazetteer (SCDAG) for Multilingual Complex NER. The method first utilizes a statistics-based approach to construct a gazetteer. Secondly, the representations of gazetteer networks and language models are adapted by minimizing the KL divergence between them at the sentence-level and entity-level. Finally, these two networks are then integrated for supervised named entity recognition (NER) training. The proposed method is applied to several state-of-the-art Transformer-based NER models with a gazetteer built from Wikidata, and shows great generalization ability across them. The final predictions are derived from an ensemble of these trained models. Experimental results and detailed analysis verify the effectiveness of the proposed method. The official results show that our system ranked 1st on one track (Hindi) in this task.","PeriodicalId":444285,"journal":{"name":"International Workshop on Semantic Evaluation","volume":"255 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Semantic Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2305.02517","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

This paper describes the system developed by the USTC-NELSLIP team for SemEval-2023 Task 2 Multilingual Complex Named Entity Recognition (MultiCoNER II). We propose a method named Statistical Construction and Dual Adaptation of Gazetteer (SCDAG) for Multilingual Complex NER. The method first utilizes a statistics-based approach to construct a gazetteer. Secondly, the representations of gazetteer networks and language models are adapted by minimizing the KL divergence between them at the sentence-level and entity-level. Finally, these two networks are then integrated for supervised named entity recognition (NER) training. The proposed method is applied to several state-of-the-art Transformer-based NER models with a gazetteer built from Wikidata, and shows great generalization ability across them. The final predictions are derived from an ensemble of these trained models. Experimental results and detailed analysis verify the effectiveness of the proposed method. The official results show that our system ranked 1st on one track (Hindi) in this task.
任务2:多语种复杂NER地名词典的统计构建与双重适应
本文介绍了由中国科学技术大学(USTC-NELSLIP)团队为SemEval-2023任务2多语言复杂命名实体识别(MultiCoNER II)开发的系统,提出了一种多语言复杂命名实体识别的统计构建和词典双适应(SCDAG)方法。该方法首先利用基于统计的方法来构建地名表。其次,通过最小化句子级和实体级之间的KL差异来适应地名词典网络和语言模型的表示。最后,将这两个网络结合起来进行有监督命名实体识别(NER)训练。将该方法应用于基于Wikidata的地语表的几种最先进的基于变压器的NER模型,并在它们之间显示出良好的泛化能力。最终的预测是从这些训练模型的集合中得出的。实验结果和详细分析验证了该方法的有效性。官方结果显示,在这项任务中,我们的系统在一条赛道(印地语)上排名第一。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信