Extracting Chinese Domain-specific Open Entity and Relation by Using Learning Patterns

Hongying Wen, Zhiguang Wang, Qiang Lu
{"title":"Extracting Chinese Domain-specific Open Entity and Relation by Using Learning Patterns","authors":"Hongying Wen, Zhiguang Wang, Qiang Lu","doi":"10.1145/3393527.3393548","DOIUrl":null,"url":null,"abstract":"Nowadays, Chinese domain-specific relation extraction faces a major challenge, that is the lack of annotation data. To cope with this challenge, the distant supervision which can automatically label large-scale training data was proposed. However, the distant supervision can produce noisy data which will hinder the performance of a model trained on such noisy data. Although significant progress has been made in filtering noisy data, the distant supervision method extracts the relation which already exists in the knowledge base. However, another major challenge in the extraction of domain-specific entities and relations is the diversity of entities and relation, which makes it difficult to accurately predefine relations in the knowledge base. Therefore, the distant supervision does not apply to domain-specific. In order to overcome the above challenges on specific domain, this paper proposes a Chinese Domain-specific Open Entity Relation Extraction Model (DOERE) which learns patterns from a small number of annotated data, and applies extraction patterns to the new domain-specific corpus for extracting entities and relations. Then, this paper proposes a method for automatically labeling data based on patterns. The experimental results show that the model has achieved better precision and recall in large-scale specific domain. And the method of automatically labeling data based on patterns has a good effect on data labeling in specific domain.","PeriodicalId":364264,"journal":{"name":"Proceedings of the ACM Turing Celebration Conference - China","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Turing Celebration Conference - China","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3393527.3393548","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Nowadays, Chinese domain-specific relation extraction faces a major challenge, that is the lack of annotation data. To cope with this challenge, the distant supervision which can automatically label large-scale training data was proposed. However, the distant supervision can produce noisy data which will hinder the performance of a model trained on such noisy data. Although significant progress has been made in filtering noisy data, the distant supervision method extracts the relation which already exists in the knowledge base. However, another major challenge in the extraction of domain-specific entities and relations is the diversity of entities and relation, which makes it difficult to accurately predefine relations in the knowledge base. Therefore, the distant supervision does not apply to domain-specific. In order to overcome the above challenges on specific domain, this paper proposes a Chinese Domain-specific Open Entity Relation Extraction Model (DOERE) which learns patterns from a small number of annotated data, and applies extraction patterns to the new domain-specific corpus for extracting entities and relations. Then, this paper proposes a method for automatically labeling data based on patterns. The experimental results show that the model has achieved better precision and recall in large-scale specific domain. And the method of automatically labeling data based on patterns has a good effect on data labeling in specific domain.
基于学习模式的中文特定领域开放实体和关系提取
目前,中文特定领域关系抽取面临着标注数据不足的难题。为了应对这一挑战,提出了一种能够自动标记大规模训练数据的远程监督方法。然而,远程监督会产生有噪声的数据,这将阻碍在这些有噪声数据上训练的模型的性能。尽管在过滤噪声数据方面已经取得了重大进展,但远程监督方法提取的是知识库中已经存在的关系。然而,领域特定实体和关系提取的另一个主要挑战是实体和关系的多样性,这使得很难准确地预定义知识库中的关系。因此,远程监督并不适用于特定领域。针对上述问题,本文提出了一种面向特定领域的中文开放实体关系抽取模型(DOERE),该模型从少量标注数据中学习模式,并将抽取模式应用到新的面向特定领域的语料库中,用于抽取实体和关系。然后,本文提出了一种基于模式的数据自动标注方法。实验结果表明,该模型在大规模特定域内取得了较好的查全率和查全率。基于模式的数据自动标注方法对特定领域的数据标注有很好的效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信