一种新的实体间关系词的无监督提取算法

Fan Wu, Taihao Zheng, L. Yao, Honghai Feng
{"title":"一种新的实体间关系词的无监督提取算法","authors":"Fan Wu, Taihao Zheng, L. Yao, Honghai Feng","doi":"10.1109/CTISC52352.2021.00037","DOIUrl":null,"url":null,"abstract":"Purpose: In order to use a popular supervised learning algorithm such as BERT to extract the relationships of concepts (triple relationship extraction), it is necessary to label the relationship types manually. If some relation words are not been labeled in the training stag, they cannot be recognized probably in the test stage and the corresponding entities cannot been recognized accordingly. This paper proposes a new unsupervised algorithm to extract as many relation words as possible of two entities, especially those that are easily overlooked. Methods: The disease-cause relationship was taken as an example, and 10204 effective sentences of disease and corresponding causes were extracted by web crawler. According to the constraints of syntactic, semantic and lexical features, the relationship words were extracted with an unsupervised manner, and the automatic extracted results were summarized. Results: Some specific relation words that are ignored in manual labeling stage are found; the conjoining relation words often appeared together in the texts are recognized; some types and features of relation words are obtained. These types and features can be used to help the relation labeling in the supervised learning stage, and to help expanding the relevant knowledge graphs and improving the accuracy of information retrieval.","PeriodicalId":268378,"journal":{"name":"2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A new unsupervised Algorithm for extracting relationship words between two entities\",\"authors\":\"Fan Wu, Taihao Zheng, L. Yao, Honghai Feng\",\"doi\":\"10.1109/CTISC52352.2021.00037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: In order to use a popular supervised learning algorithm such as BERT to extract the relationships of concepts (triple relationship extraction), it is necessary to label the relationship types manually. If some relation words are not been labeled in the training stag, they cannot be recognized probably in the test stage and the corresponding entities cannot been recognized accordingly. This paper proposes a new unsupervised algorithm to extract as many relation words as possible of two entities, especially those that are easily overlooked. Methods: The disease-cause relationship was taken as an example, and 10204 effective sentences of disease and corresponding causes were extracted by web crawler. According to the constraints of syntactic, semantic and lexical features, the relationship words were extracted with an unsupervised manner, and the automatic extracted results were summarized. Results: Some specific relation words that are ignored in manual labeling stage are found; the conjoining relation words often appeared together in the texts are recognized; some types and features of relation words are obtained. These types and features can be used to help the relation labeling in the supervised learning stage, and to help expanding the relevant knowledge graphs and improving the accuracy of information retrieval.\",\"PeriodicalId\":268378,\"journal\":{\"name\":\"2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CTISC52352.2021.00037\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CTISC52352.2021.00037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

目的:为了使用BERT等流行的监督学习算法来提取概念之间的关系(三重关系提取),有必要手动标记关系类型。如果在训练阶段没有对某些关系词进行标注,则可能在测试阶段无法识别这些关系词,从而无法识别相应的实体。本文提出了一种新的无监督算法,以尽可能多地提取两个实体之间的关系词,特别是那些容易被忽略的关系词。方法:以病因关系为例,通过网络爬虫提取10204个有效的疾病句子及其原因。根据句法、语义和词法特征的约束,采用无监督方式提取关系词,并对自动提取结果进行汇总。结果:发现了一些在手工标注阶段被忽略的特定关系词;识别出文本中经常同时出现的连词;得到了关系词的一些类型和特征。这些类型和特征可以用来帮助监督学习阶段的关系标注,并有助于扩展相关知识图,提高信息检索的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A new unsupervised Algorithm for extracting relationship words between two entities
Purpose: In order to use a popular supervised learning algorithm such as BERT to extract the relationships of concepts (triple relationship extraction), it is necessary to label the relationship types manually. If some relation words are not been labeled in the training stag, they cannot be recognized probably in the test stage and the corresponding entities cannot been recognized accordingly. This paper proposes a new unsupervised algorithm to extract as many relation words as possible of two entities, especially those that are easily overlooked. Methods: The disease-cause relationship was taken as an example, and 10204 effective sentences of disease and corresponding causes were extracted by web crawler. According to the constraints of syntactic, semantic and lexical features, the relationship words were extracted with an unsupervised manner, and the automatic extracted results were summarized. Results: Some specific relation words that are ignored in manual labeling stage are found; the conjoining relation words often appeared together in the texts are recognized; some types and features of relation words are obtained. These types and features can be used to help the relation labeling in the supervised learning stage, and to help expanding the relevant knowledge graphs and improving the accuracy of information retrieval.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信