{"title":"Extracting Chinese Domain-specific Open Entity and Relation by Using Learning Patterns","authors":"Hongying Wen, Zhiguang Wang, Qiang Lu","doi":"10.1145/3393527.3393548","DOIUrl":null,"url":null,"abstract":"Nowadays, Chinese domain-specific relation extraction faces a major challenge, that is the lack of annotation data. To cope with this challenge, the distant supervision which can automatically label large-scale training data was proposed. However, the distant supervision can produce noisy data which will hinder the performance of a model trained on such noisy data. Although significant progress has been made in filtering noisy data, the distant supervision method extracts the relation which already exists in the knowledge base. However, another major challenge in the extraction of domain-specific entities and relations is the diversity of entities and relation, which makes it difficult to accurately predefine relations in the knowledge base. Therefore, the distant supervision does not apply to domain-specific. In order to overcome the above challenges on specific domain, this paper proposes a Chinese Domain-specific Open Entity Relation Extraction Model (DOERE) which learns patterns from a small number of annotated data, and applies extraction patterns to the new domain-specific corpus for extracting entities and relations. Then, this paper proposes a method for automatically labeling data based on patterns. The experimental results show that the model has achieved better precision and recall in large-scale specific domain. And the method of automatically labeling data based on patterns has a good effect on data labeling in specific domain.","PeriodicalId":364264,"journal":{"name":"Proceedings of the ACM Turing Celebration Conference - China","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Turing Celebration Conference - China","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3393527.3393548","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Nowadays, Chinese domain-specific relation extraction faces a major challenge, that is the lack of annotation data. To cope with this challenge, the distant supervision which can automatically label large-scale training data was proposed. However, the distant supervision can produce noisy data which will hinder the performance of a model trained on such noisy data. Although significant progress has been made in filtering noisy data, the distant supervision method extracts the relation which already exists in the knowledge base. However, another major challenge in the extraction of domain-specific entities and relations is the diversity of entities and relation, which makes it difficult to accurately predefine relations in the knowledge base. Therefore, the distant supervision does not apply to domain-specific. In order to overcome the above challenges on specific domain, this paper proposes a Chinese Domain-specific Open Entity Relation Extraction Model (DOERE) which learns patterns from a small number of annotated data, and applies extraction patterns to the new domain-specific corpus for extracting entities and relations. Then, this paper proposes a method for automatically labeling data based on patterns. The experimental results show that the model has achieved better precision and recall in large-scale specific domain. And the method of automatically labeling data based on patterns has a good effect on data labeling in specific domain.