{"title":"A new unsupervised Algorithm for extracting relationship words between two entities","authors":"Fan Wu, Taihao Zheng, L. Yao, Honghai Feng","doi":"10.1109/CTISC52352.2021.00037","DOIUrl":null,"url":null,"abstract":"Purpose: In order to use a popular supervised learning algorithm such as BERT to extract the relationships of concepts (triple relationship extraction), it is necessary to label the relationship types manually. If some relation words are not been labeled in the training stag, they cannot be recognized probably in the test stage and the corresponding entities cannot been recognized accordingly. This paper proposes a new unsupervised algorithm to extract as many relation words as possible of two entities, especially those that are easily overlooked. Methods: The disease-cause relationship was taken as an example, and 10204 effective sentences of disease and corresponding causes were extracted by web crawler. According to the constraints of syntactic, semantic and lexical features, the relationship words were extracted with an unsupervised manner, and the automatic extracted results were summarized. Results: Some specific relation words that are ignored in manual labeling stage are found; the conjoining relation words often appeared together in the texts are recognized; some types and features of relation words are obtained. These types and features can be used to help the relation labeling in the supervised learning stage, and to help expanding the relevant knowledge graphs and improving the accuracy of information retrieval.","PeriodicalId":268378,"journal":{"name":"2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CTISC52352.2021.00037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: In order to use a popular supervised learning algorithm such as BERT to extract the relationships of concepts (triple relationship extraction), it is necessary to label the relationship types manually. If some relation words are not been labeled in the training stag, they cannot be recognized probably in the test stage and the corresponding entities cannot been recognized accordingly. This paper proposes a new unsupervised algorithm to extract as many relation words as possible of two entities, especially those that are easily overlooked. Methods: The disease-cause relationship was taken as an example, and 10204 effective sentences of disease and corresponding causes were extracted by web crawler. According to the constraints of syntactic, semantic and lexical features, the relationship words were extracted with an unsupervised manner, and the automatic extracted results were summarized. Results: Some specific relation words that are ignored in manual labeling stage are found; the conjoining relation words often appeared together in the texts are recognized; some types and features of relation words are obtained. These types and features can be used to help the relation labeling in the supervised learning stage, and to help expanding the relevant knowledge graphs and improving the accuracy of information retrieval.