Unsupervised Named Entity Recognition for Hi-Tech Domain

NLP Techniques and Applications Pub Date : 2021-11-27 DOI:10.5121/csit.2021.111917

Abinaya Govindan, Gyan Ranjan, Amit Verma

{"title":"Unsupervised Named Entity Recognition for Hi-Tech Domain","authors":"Abinaya Govindan, Gyan Ranjan, Amit Verma","doi":"10.5121/csit.2021.111917","DOIUrl":null,"url":null,"abstract":"This paper presents named entity recognition as a multi-answer QA task combined with contextual natural-language-inference based noise reduction. This method allows us to use pre-trained models that have been trained for certain downstream tasks to generate unsupervised data, reducing the need for manual annotation to create named entity tags with tokens. For each entity, we provide a unique context, such as entity types, definitions, questions and a few empirical rules along with the target text to train a named entity model for the domain of our interest. This formulation (a) allows the system to jointly learn NER-specific features from the datasets provided, and (b) can extract multiple NER-specific features, thereby boosting the performance of existing NER models (c) provides business-contextualized definitions to reduce ambiguity among similar entities. We conducted numerous tests to determine the quality of the created data, and we find that this method of data generation allows us to obtain clean, noise-free data with minimal effort and time. This approach has been demonstrated to be successful in extracting named entities, which are then used in subsequent components.","PeriodicalId":193651,"journal":{"name":"NLP Techniques and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NLP Techniques and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/csit.2021.111917","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

This paper presents named entity recognition as a multi-answer QA task combined with contextual natural-language-inference based noise reduction. This method allows us to use pre-trained models that have been trained for certain downstream tasks to generate unsupervised data, reducing the need for manual annotation to create named entity tags with tokens. For each entity, we provide a unique context, such as entity types, definitions, questions and a few empirical rules along with the target text to train a named entity model for the domain of our interest. This formulation (a) allows the system to jointly learn NER-specific features from the datasets provided, and (b) can extract multiple NER-specific features, thereby boosting the performance of existing NER models (c) provides business-contextualized definitions to reduce ambiguity among similar entities. We conducted numerous tests to determine the quality of the created data, and we find that this method of data generation allows us to obtain clean, noise-free data with minimal effort and time. This approach has been demonstrated to be successful in extracting named entities, which are then used in subsequent components.

查看原文本刊更多论文

高技术领域的无监督命名实体识别

本文将命名实体识别作为一种多答案QA任务与基于上下文自然语言推理的降噪相结合。这种方法允许我们使用预先训练的模型，这些模型已经为某些下游任务进行了训练，以生成无监督的数据，从而减少了手动注释创建带有令牌的命名实体标记的需要。对于每个实体，我们提供一个独特的上下文，如实体类型、定义、问题和一些经验规则以及目标文本，以训练我们感兴趣的领域的命名实体模型。该公式(a)允许系统从提供的数据集中共同学习特定于NER的特征，(b)可以提取多个特定于NER的特征，从而提高现有NER模型的性能(c)提供业务上下文化定义，以减少相似实体之间的歧义。我们进行了大量测试，以确定所创建数据的质量，我们发现这种数据生成方法使我们能够以最小的努力和时间获得干净、无噪声的数据。此方法已被证明在提取命名实体方面是成功的，然后在后续组件中使用命名实体。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

NLP Techniques and Applications

自引率

0.00%

发文量