RePersian:An Efficient Open Information Extraction Tool in Persian

Raana Saheb-Nassagh, Majid Asgari, B. Minaei-Bidgoli
{"title":"RePersian:An Efficient Open Information Extraction Tool in Persian","authors":"Raana Saheb-Nassagh, Majid Asgari, B. Minaei-Bidgoli","doi":"10.1109/ICWR49608.2020.9122301","DOIUrl":null,"url":null,"abstract":"Relation extraction is the task of extracting semantic information from raw data. One of the key points in the area of open information extraction systems is the ability to extract relation information automatically for any domains, especially in web mining and web research. Many researches have been done in this field for relation extraction in different languages. Many relation extraction algorithms work based on parsing trees. The Persian language, as a low-resource language, has a dependency grammar and lexical structure which makes the dependency parsing difficult or time-consuming, and it affects the speed of relation extraction in many cases. In this paper, we will introduce RePersian which is a fast method for relation extraction in Persian. Our proposed work is based on part-of-speech (POS) tags of a sentence and particular relation patterns. To achieve these patterns, we have analyzed sentence structures in the Persian language. RePersian searches through the POS-tags for finding the relation patterns, which are given in regular expression forms. In this way, RePersian finds semantic relations by matching the correct POS pattern to a relation pattern. We test and evaluate our method on the Dadegan, Persian dependency tree dataset, with two different POS tag-sets. Our approach had on average a precision of 78.05% on finding the first argument of a relation, a precision of 80.4% in finding the second argument and precision of 54.85% on finding the right relation between the arguments.","PeriodicalId":231982,"journal":{"name":"2020 6th International Conference on Web Research (ICWR)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 6th International Conference on Web Research (ICWR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWR49608.2020.9122301","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Relation extraction is the task of extracting semantic information from raw data. One of the key points in the area of open information extraction systems is the ability to extract relation information automatically for any domains, especially in web mining and web research. Many researches have been done in this field for relation extraction in different languages. Many relation extraction algorithms work based on parsing trees. The Persian language, as a low-resource language, has a dependency grammar and lexical structure which makes the dependency parsing difficult or time-consuming, and it affects the speed of relation extraction in many cases. In this paper, we will introduce RePersian which is a fast method for relation extraction in Persian. Our proposed work is based on part-of-speech (POS) tags of a sentence and particular relation patterns. To achieve these patterns, we have analyzed sentence structures in the Persian language. RePersian searches through the POS-tags for finding the relation patterns, which are given in regular expression forms. In this way, RePersian finds semantic relations by matching the correct POS pattern to a relation pattern. We test and evaluate our method on the Dadegan, Persian dependency tree dataset, with two different POS tag-sets. Our approach had on average a precision of 78.05% on finding the first argument of a relation, a precision of 80.4% in finding the second argument and precision of 54.85% on finding the right relation between the arguments.
一种有效的波斯语开放信息提取工具
关系提取是从原始数据中提取语义信息的任务。开放信息抽取系统的关键之一是能够自动抽取任何领域的关系信息,特别是在web挖掘和web研究中。在这一领域已有许多不同语言的关系提取研究。许多关系提取算法都是基于解析树的。波斯语作为一种资源较低的语言,其依赖语法和词汇结构使得依赖项解析困难或耗时,并且在很多情况下影响了关系提取的速度。本文将介绍一种快速的波斯语关系抽取方法——repsian。我们提出的工作是基于句子的词性(POS)标签和特定的关系模式。为了实现这些模式,我们分析了波斯语的句子结构。reresian通过pos标签搜索以正则表达式形式给出的关系模式。通过这种方式,repsian通过将正确的POS模式匹配到关系模式来查找语义关系。我们使用两个不同的POS标记集在Dadegan, Persian依赖树数据集上测试和评估了我们的方法。我们的方法在寻找关系的第一个参数上的平均精度为78.05%,在寻找第二个参数上的平均精度为80.4%,在寻找参数之间的正确关系上的平均精度为54.85%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信