Unsupervised Named Entity Recognition for Hi-Tech Domain

Abinaya Govindan, Gyan Ranjan, Amit Verma
{"title":"Unsupervised Named Entity Recognition for Hi-Tech Domain","authors":"Abinaya Govindan, Gyan Ranjan, Amit Verma","doi":"10.5121/csit.2021.111917","DOIUrl":null,"url":null,"abstract":"This paper presents named entity recognition as a multi-answer QA task combined with contextual natural-language-inference based noise reduction. This method allows us to use pre-trained models that have been trained for certain downstream tasks to generate unsupervised data, reducing the need for manual annotation to create named entity tags with tokens. For each entity, we provide a unique context, such as entity types, definitions, questions and a few empirical rules along with the target text to train a named entity model for the domain of our interest. This formulation (a) allows the system to jointly learn NER-specific features from the datasets provided, and (b) can extract multiple NER-specific features, thereby boosting the performance of existing NER models (c) provides business-contextualized definitions to reduce ambiguity among similar entities. We conducted numerous tests to determine the quality of the created data, and we find that this method of data generation allows us to obtain clean, noise-free data with minimal effort and time. This approach has been demonstrated to be successful in extracting named entities, which are then used in subsequent components.","PeriodicalId":193651,"journal":{"name":"NLP Techniques and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NLP Techniques and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/csit.2021.111917","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

This paper presents named entity recognition as a multi-answer QA task combined with contextual natural-language-inference based noise reduction. This method allows us to use pre-trained models that have been trained for certain downstream tasks to generate unsupervised data, reducing the need for manual annotation to create named entity tags with tokens. For each entity, we provide a unique context, such as entity types, definitions, questions and a few empirical rules along with the target text to train a named entity model for the domain of our interest. This formulation (a) allows the system to jointly learn NER-specific features from the datasets provided, and (b) can extract multiple NER-specific features, thereby boosting the performance of existing NER models (c) provides business-contextualized definitions to reduce ambiguity among similar entities. We conducted numerous tests to determine the quality of the created data, and we find that this method of data generation allows us to obtain clean, noise-free data with minimal effort and time. This approach has been demonstrated to be successful in extracting named entities, which are then used in subsequent components.
高技术领域的无监督命名实体识别
本文将命名实体识别作为一种多答案QA任务与基于上下文自然语言推理的降噪相结合。这种方法允许我们使用预先训练的模型,这些模型已经为某些下游任务进行了训练,以生成无监督的数据,从而减少了手动注释创建带有令牌的命名实体标记的需要。对于每个实体,我们提供一个独特的上下文,如实体类型、定义、问题和一些经验规则以及目标文本,以训练我们感兴趣的领域的命名实体模型。该公式(a)允许系统从提供的数据集中共同学习特定于NER的特征,(b)可以提取多个特定于NER的特征,从而提高现有NER模型的性能(c)提供业务上下文化定义,以减少相似实体之间的歧义。我们进行了大量测试,以确定所创建数据的质量,我们发现这种数据生成方法使我们能够以最小的努力和时间获得干净、无噪声的数据。此方法已被证明在提取命名实体方面是成功的,然后在后续组件中使用命名实体。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信