RePersian:An Efficient Open Information Extraction Tool in Persian

2020 6th International Conference on Web Research (ICWR) Pub Date : 2020-04-01 DOI:10.1109/ICWR49608.2020.9122301

Raana Saheb-Nassagh, Majid Asgari, B. Minaei-Bidgoli

{"title":"RePersian:An Efficient Open Information Extraction Tool in Persian","authors":"Raana Saheb-Nassagh, Majid Asgari, B. Minaei-Bidgoli","doi":"10.1109/ICWR49608.2020.9122301","DOIUrl":null,"url":null,"abstract":"Relation extraction is the task of extracting semantic information from raw data. One of the key points in the area of open information extraction systems is the ability to extract relation information automatically for any domains, especially in web mining and web research. Many researches have been done in this field for relation extraction in different languages. Many relation extraction algorithms work based on parsing trees. The Persian language, as a low-resource language, has a dependency grammar and lexical structure which makes the dependency parsing difficult or time-consuming, and it affects the speed of relation extraction in many cases. In this paper, we will introduce RePersian which is a fast method for relation extraction in Persian. Our proposed work is based on part-of-speech (POS) tags of a sentence and particular relation patterns. To achieve these patterns, we have analyzed sentence structures in the Persian language. RePersian searches through the POS-tags for finding the relation patterns, which are given in regular expression forms. In this way, RePersian finds semantic relations by matching the correct POS pattern to a relation pattern. We test and evaluate our method on the Dadegan, Persian dependency tree dataset, with two different POS tag-sets. Our approach had on average a precision of 78.05% on finding the first argument of a relation, a precision of 80.4% in finding the second argument and precision of 54.85% on finding the right relation between the arguments.","PeriodicalId":231982,"journal":{"name":"2020 6th International Conference on Web Research (ICWR)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 6th International Conference on Web Research (ICWR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWR49608.2020.9122301","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Relation extraction is the task of extracting semantic information from raw data. One of the key points in the area of open information extraction systems is the ability to extract relation information automatically for any domains, especially in web mining and web research. Many researches have been done in this field for relation extraction in different languages. Many relation extraction algorithms work based on parsing trees. The Persian language, as a low-resource language, has a dependency grammar and lexical structure which makes the dependency parsing difficult or time-consuming, and it affects the speed of relation extraction in many cases. In this paper, we will introduce RePersian which is a fast method for relation extraction in Persian. Our proposed work is based on part-of-speech (POS) tags of a sentence and particular relation patterns. To achieve these patterns, we have analyzed sentence structures in the Persian language. RePersian searches through the POS-tags for finding the relation patterns, which are given in regular expression forms. In this way, RePersian finds semantic relations by matching the correct POS pattern to a relation pattern. We test and evaluate our method on the Dadegan, Persian dependency tree dataset, with two different POS tag-sets. Our approach had on average a precision of 78.05% on finding the first argument of a relation, a precision of 80.4% in finding the second argument and precision of 54.85% on finding the right relation between the arguments.

查看原文本刊更多论文

一种有效的波斯语开放信息提取工具

关系提取是从原始数据中提取语义信息的任务。开放信息抽取系统的关键之一是能够自动抽取任何领域的关系信息，特别是在web挖掘和web研究中。在这一领域已有许多不同语言的关系提取研究。许多关系提取算法都是基于解析树的。波斯语作为一种资源较低的语言，其依赖语法和词汇结构使得依赖项解析困难或耗时，并且在很多情况下影响了关系提取的速度。本文将介绍一种快速的波斯语关系抽取方法——repsian。我们提出的工作是基于句子的词性(POS)标签和特定的关系模式。为了实现这些模式，我们分析了波斯语的句子结构。reresian通过pos标签搜索以正则表达式形式给出的关系模式。通过这种方式，repsian通过将正确的POS模式匹配到关系模式来查找语义关系。我们使用两个不同的POS标记集在Dadegan, Persian依赖树数据集上测试和评估了我们的方法。我们的方法在寻找关系的第一个参数上的平均精度为78.05%，在寻找第二个参数上的平均精度为80.4%，在寻找参数之间的正确关系上的平均精度为54.85%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 6th International Conference on Web Research (ICWR)

自引率

0.00%

发文量