Extracting Chinese Domain-specific Open Entity and Relation by Using Learning Patterns

Proceedings of the ACM Turing Celebration Conference - China Pub Date : 2020-05-22 DOI:10.1145/3393527.3393548

Hongying Wen, Zhiguang Wang, Qiang Lu

{"title":"Extracting Chinese Domain-specific Open Entity and Relation by Using Learning Patterns","authors":"Hongying Wen, Zhiguang Wang, Qiang Lu","doi":"10.1145/3393527.3393548","DOIUrl":null,"url":null,"abstract":"Nowadays, Chinese domain-specific relation extraction faces a major challenge, that is the lack of annotation data. To cope with this challenge, the distant supervision which can automatically label large-scale training data was proposed. However, the distant supervision can produce noisy data which will hinder the performance of a model trained on such noisy data. Although significant progress has been made in filtering noisy data, the distant supervision method extracts the relation which already exists in the knowledge base. However, another major challenge in the extraction of domain-specific entities and relations is the diversity of entities and relation, which makes it difficult to accurately predefine relations in the knowledge base. Therefore, the distant supervision does not apply to domain-specific. In order to overcome the above challenges on specific domain, this paper proposes a Chinese Domain-specific Open Entity Relation Extraction Model (DOERE) which learns patterns from a small number of annotated data, and applies extraction patterns to the new domain-specific corpus for extracting entities and relations. Then, this paper proposes a method for automatically labeling data based on patterns. The experimental results show that the model has achieved better precision and recall in large-scale specific domain. And the method of automatically labeling data based on patterns has a good effect on data labeling in specific domain.","PeriodicalId":364264,"journal":{"name":"Proceedings of the ACM Turing Celebration Conference - China","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Turing Celebration Conference - China","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3393527.3393548","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Nowadays, Chinese domain-specific relation extraction faces a major challenge, that is the lack of annotation data. To cope with this challenge, the distant supervision which can automatically label large-scale training data was proposed. However, the distant supervision can produce noisy data which will hinder the performance of a model trained on such noisy data. Although significant progress has been made in filtering noisy data, the distant supervision method extracts the relation which already exists in the knowledge base. However, another major challenge in the extraction of domain-specific entities and relations is the diversity of entities and relation, which makes it difficult to accurately predefine relations in the knowledge base. Therefore, the distant supervision does not apply to domain-specific. In order to overcome the above challenges on specific domain, this paper proposes a Chinese Domain-specific Open Entity Relation Extraction Model (DOERE) which learns patterns from a small number of annotated data, and applies extraction patterns to the new domain-specific corpus for extracting entities and relations. Then, this paper proposes a method for automatically labeling data based on patterns. The experimental results show that the model has achieved better precision and recall in large-scale specific domain. And the method of automatically labeling data based on patterns has a good effect on data labeling in specific domain.

查看原文本刊更多论文

基于学习模式的中文特定领域开放实体和关系提取

目前，中文特定领域关系抽取面临着标注数据不足的难题。为了应对这一挑战，提出了一种能够自动标记大规模训练数据的远程监督方法。然而，远程监督会产生有噪声的数据，这将阻碍在这些有噪声数据上训练的模型的性能。尽管在过滤噪声数据方面已经取得了重大进展，但远程监督方法提取的是知识库中已经存在的关系。然而，领域特定实体和关系提取的另一个主要挑战是实体和关系的多样性，这使得很难准确地预定义知识库中的关系。因此，远程监督并不适用于特定领域。针对上述问题，本文提出了一种面向特定领域的中文开放实体关系抽取模型(DOERE)，该模型从少量标注数据中学习模式，并将抽取模式应用到新的面向特定领域的语料库中，用于抽取实体和关系。然后，本文提出了一种基于模式的数据自动标注方法。实验结果表明，该模型在大规模特定域内取得了较好的查全率和查全率。基于模式的数据自动标注方法对特定领域的数据标注有很好的效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ACM Turing Celebration Conference - China

自引率

0.00%

发文量