ParsiPayesh:基于语义和结构分析的波斯语剽窃检测

2020 10th International Conference on Computer and Knowledge Engineering (ICCKE) Pub Date : 2020-10-29 DOI:10.1109/ICCKE50421.2020.9303672

S. Lazemi, H. Ebrahimpour-Komleh

{"title":"ParsiPayesh:基于语义和结构分析的波斯语剽窃检测","authors":"S. Lazemi, H. Ebrahimpour-Komleh","doi":"10.1109/ICCKE50421.2020.9303672","DOIUrl":null,"url":null,"abstract":"In recent years, the rapid increase of Persian electronic resources and facility of access to them has seriously triggered the plagiarism problem of the Iranian scientific community. Despite the automatic systems of plagiarism detection, like Turnitin, Eve2, this problem has strongly remained due to lack of support from Persian. The main purpose of this article is to detect exact plagiarisms and re-writings in Persian science texts. In our proposed method, after the candidate retrieval based on the statistical characteristics, in the text alignment step, structural analysis and semantic analysis of expression has been performed to detect re-writing plagiarisms. Firstly, data-driven dependency parser has been improved with the help of a deep learning model for Persian language to analyze the structure of the expression, and then the degree of structural similarity of the expression is evaluated through the analysis of the dependency tree. In this paper, our suggestion to examine the semantic similarity of expression is to use the semantic role labeling obtained from the deep learning model presented. The experiments have been performed on the corpus prepared in the AAIC2015 and corpus of the PAN2015 competitions. The results indicate that structural and semantic information improves the performance of the proposed method. ParsiPayesh is available on http://www.parsipayesh.ir.","PeriodicalId":402043,"journal":{"name":"2020 10th International Conference on Computer and Knowledge Engineering (ICCKE)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"ParsiPayesh: Persian Plagiarism Detection based on Semantic and Structural Analysis\",\"authors\":\"S. Lazemi, H. Ebrahimpour-Komleh\",\"doi\":\"10.1109/ICCKE50421.2020.9303672\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, the rapid increase of Persian electronic resources and facility of access to them has seriously triggered the plagiarism problem of the Iranian scientific community. Despite the automatic systems of plagiarism detection, like Turnitin, Eve2, this problem has strongly remained due to lack of support from Persian. The main purpose of this article is to detect exact plagiarisms and re-writings in Persian science texts. In our proposed method, after the candidate retrieval based on the statistical characteristics, in the text alignment step, structural analysis and semantic analysis of expression has been performed to detect re-writing plagiarisms. Firstly, data-driven dependency parser has been improved with the help of a deep learning model for Persian language to analyze the structure of the expression, and then the degree of structural similarity of the expression is evaluated through the analysis of the dependency tree. In this paper, our suggestion to examine the semantic similarity of expression is to use the semantic role labeling obtained from the deep learning model presented. The experiments have been performed on the corpus prepared in the AAIC2015 and corpus of the PAN2015 competitions. The results indicate that structural and semantic information improves the performance of the proposed method. ParsiPayesh is available on http://www.parsipayesh.ir.\",\"PeriodicalId\":402043,\"journal\":{\"name\":\"2020 10th International Conference on Computer and Knowledge Engineering (ICCKE)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 10th International Conference on Computer and Knowledge Engineering (ICCKE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCKE50421.2020.9303672\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 10th International Conference on Computer and Knowledge Engineering (ICCKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCKE50421.2020.9303672","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

近年来，波斯电子资源的迅速增加和获取这些资源的便利，严重引发了伊朗科学界的剽窃问题。尽管有自动抄袭检测系统，如Turnitin, ev2，但由于缺乏波斯语的支持，这个问题仍然严重存在。本文的主要目的是检测波斯语科学文本中的剽窃和重写。在我们提出的方法中，在基于统计特征的候选检索之后，在文本对齐步骤中，进行表达的结构分析和语义分析，以检测重写抄袭。首先，利用波斯语深度学习模型对数据驱动依赖解析器进行改进，分析表达式的结构，然后通过依赖树分析表达式的结构相似度。在本文中，我们建议使用从所提出的深度学习模型中获得的语义角色标记来检验表达的语义相似度。实验分别在AAIC2015和PAN2015竞赛的语料库上进行。结果表明，结构信息和语义信息提高了该方法的性能。ParsiPayesh可以在http://www.parsipayesh.ir上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ParsiPayesh: Persian Plagiarism Detection based on Semantic and Structural Analysis

In recent years, the rapid increase of Persian electronic resources and facility of access to them has seriously triggered the plagiarism problem of the Iranian scientific community. Despite the automatic systems of plagiarism detection, like Turnitin, Eve2, this problem has strongly remained due to lack of support from Persian. The main purpose of this article is to detect exact plagiarisms and re-writings in Persian science texts. In our proposed method, after the candidate retrieval based on the statistical characteristics, in the text alignment step, structural analysis and semantic analysis of expression has been performed to detect re-writing plagiarisms. Firstly, data-driven dependency parser has been improved with the help of a deep learning model for Persian language to analyze the structure of the expression, and then the degree of structural similarity of the expression is evaluated through the analysis of the dependency tree. In this paper, our suggestion to examine the semantic similarity of expression is to use the semantic role labeling obtained from the deep learning model presented. The experiments have been performed on the corpus prepared in the AAIC2015 and corpus of the PAN2015 competitions. The results indicate that structural and semantic information improves the performance of the proposed method. ParsiPayesh is available on http://www.parsipayesh.ir.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 10th International Conference on Computer and Knowledge Engineering (ICCKE)

自引率

0.00%

发文量