A Joint Entity-Relation Detection and Generalization Method Based on Syntax and semantic for Chinese Intangible Cultural Heritage Texts

IF 2.1 3区 计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Yuyao Tan, Hao Wang, Zibo Zhao, Tao Fan
{"title":"A Joint Entity-Relation Detection and Generalization Method Based on Syntax and semantic for Chinese Intangible Cultural Heritage Texts","authors":"Yuyao Tan, Hao Wang, Zibo Zhao, Tao Fan","doi":"10.1145/3631124","DOIUrl":null,"url":null,"abstract":"[Purpose/Significance] The annotation of natural language corpus not only facilitates researchers to extract knowledge from it, but also helps to achieve deeper mining of the corpus. But the annotated corpus in the humanities knowledge domain is less. And the semantic annotation of humanities texts is difficult, because it requires a high domain background for researchers, even requires the participation of domain experts. Based on this, this study proposes a method for detecting entities and relations in domain which is lack of annotated corpus, and provides a referenceable idea for constructing conceptual models based on textual instances. [Method/Process] Based on syntactic and semantic features, this study proposes SPO triple recognition rules from the perspective of giving priority to predicates and generalization rules from the perspective of triple's content and the meaning of its predicate. The recognition rules are used to extract text-descriptive SPO triples centered on predicates. After clustering and adjusting triples, use the generalization rules proposed in this study to obtain coarse-grained entities and relations, and then form a conceptual model. [Results/Conclusions] This study recognizes SPO triples with high precision and summarization from descriptive texts, generalizes them and then forms a domain conceptual model. The method proposed in this paper provides a research idea for entity-relation detection in a domain with missing annotated corpus, and the formed domain conceptual model provides a reference for building a domain Linked Data Graph. The feasibility of the method is verified through practice on texts related to the four traditional Chinese festivals.","PeriodicalId":54310,"journal":{"name":"ACM Journal on Computing and Cultural Heritage","volume":"15 4","pages":"0"},"PeriodicalIF":2.1000,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal on Computing and Cultural Heritage","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3631124","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

[Purpose/Significance] The annotation of natural language corpus not only facilitates researchers to extract knowledge from it, but also helps to achieve deeper mining of the corpus. But the annotated corpus in the humanities knowledge domain is less. And the semantic annotation of humanities texts is difficult, because it requires a high domain background for researchers, even requires the participation of domain experts. Based on this, this study proposes a method for detecting entities and relations in domain which is lack of annotated corpus, and provides a referenceable idea for constructing conceptual models based on textual instances. [Method/Process] Based on syntactic and semantic features, this study proposes SPO triple recognition rules from the perspective of giving priority to predicates and generalization rules from the perspective of triple's content and the meaning of its predicate. The recognition rules are used to extract text-descriptive SPO triples centered on predicates. After clustering and adjusting triples, use the generalization rules proposed in this study to obtain coarse-grained entities and relations, and then form a conceptual model. [Results/Conclusions] This study recognizes SPO triples with high precision and summarization from descriptive texts, generalizes them and then forms a domain conceptual model. The method proposed in this paper provides a research idea for entity-relation detection in a domain with missing annotated corpus, and the formed domain conceptual model provides a reference for building a domain Linked Data Graph. The feasibility of the method is verified through practice on texts related to the four traditional Chinese festivals.
基于句法和语义的中国非物质文化遗产文本实体-关系联合检测与概化方法
【目的/意义】对自然语言语料库进行标注,不仅便于研究人员从中提取知识,而且有助于实现对语料库的更深层次挖掘。但人文知识领域的标注语料库较少。而人文文本的语义标注难度较大,因为它对研究者的领域背景要求较高,甚至需要领域专家的参与。在此基础上,本研究提出了一种在缺乏标注语料库的领域中检测实体和关系的方法,为基于文本实例构建概念模型提供了可参考的思路。[方法/过程]本研究基于句法和语义特征,从谓词优先的角度提出SPO三元组识别规则,从三元组的内容及其谓词意义的角度提出归纳规则。识别规则用于提取以谓词为中心的文本描述性SPO三元组。在聚类和调整三元组后,利用本文提出的概化规则获得粗粒度的实体和关系,进而形成概念模型。【结果/结论】本研究从描述性文本中识别出具有较高精度和概要性的SPO三元组,并对其进行概化,形成领域概念模型。本文提出的方法为缺少标注语料的领域的实体关系检测提供了一种研究思路,形成的领域概念模型为构建领域关联数据图提供了参考。通过对中国四大传统节日相关文本的实践验证了该方法的可行性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACM Journal on Computing and Cultural Heritage
ACM Journal on Computing and Cultural Heritage Arts and Humanities-Conservation
CiteScore
4.60
自引率
8.30%
发文量
90
期刊介绍: ACM Journal on Computing and Cultural Heritage (JOCCH) publishes papers of significant and lasting value in all areas relating to the use of information and communication technologies (ICT) in support of Cultural Heritage. The journal encourages the submission of manuscripts that demonstrate innovative use of technology for the discovery, analysis, interpretation and presentation of cultural material, as well as manuscripts that illustrate applications in the Cultural Heritage sector that challenge the computational technologies and suggest new research opportunities in computer science.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信