Raana Saheb-Nassagh, Majid Asgari, B. Minaei-Bidgoli
{"title":"RePersian:An Efficient Open Information Extraction Tool in Persian","authors":"Raana Saheb-Nassagh, Majid Asgari, B. Minaei-Bidgoli","doi":"10.1109/ICWR49608.2020.9122301","DOIUrl":null,"url":null,"abstract":"Relation extraction is the task of extracting semantic information from raw data. One of the key points in the area of open information extraction systems is the ability to extract relation information automatically for any domains, especially in web mining and web research. Many researches have been done in this field for relation extraction in different languages. Many relation extraction algorithms work based on parsing trees. The Persian language, as a low-resource language, has a dependency grammar and lexical structure which makes the dependency parsing difficult or time-consuming, and it affects the speed of relation extraction in many cases. In this paper, we will introduce RePersian which is a fast method for relation extraction in Persian. Our proposed work is based on part-of-speech (POS) tags of a sentence and particular relation patterns. To achieve these patterns, we have analyzed sentence structures in the Persian language. RePersian searches through the POS-tags for finding the relation patterns, which are given in regular expression forms. In this way, RePersian finds semantic relations by matching the correct POS pattern to a relation pattern. We test and evaluate our method on the Dadegan, Persian dependency tree dataset, with two different POS tag-sets. Our approach had on average a precision of 78.05% on finding the first argument of a relation, a precision of 80.4% in finding the second argument and precision of 54.85% on finding the right relation between the arguments.","PeriodicalId":231982,"journal":{"name":"2020 6th International Conference on Web Research (ICWR)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 6th International Conference on Web Research (ICWR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWR49608.2020.9122301","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Relation extraction is the task of extracting semantic information from raw data. One of the key points in the area of open information extraction systems is the ability to extract relation information automatically for any domains, especially in web mining and web research. Many researches have been done in this field for relation extraction in different languages. Many relation extraction algorithms work based on parsing trees. The Persian language, as a low-resource language, has a dependency grammar and lexical structure which makes the dependency parsing difficult or time-consuming, and it affects the speed of relation extraction in many cases. In this paper, we will introduce RePersian which is a fast method for relation extraction in Persian. Our proposed work is based on part-of-speech (POS) tags of a sentence and particular relation patterns. To achieve these patterns, we have analyzed sentence structures in the Persian language. RePersian searches through the POS-tags for finding the relation patterns, which are given in regular expression forms. In this way, RePersian finds semantic relations by matching the correct POS pattern to a relation pattern. We test and evaluate our method on the Dadegan, Persian dependency tree dataset, with two different POS tag-sets. Our approach had on average a precision of 78.05% on finding the first argument of a relation, a precision of 80.4% in finding the second argument and precision of 54.85% on finding the right relation between the arguments.