{"title":"ParsiPayesh:基于语义和结构分析的波斯语剽窃检测","authors":"S. Lazemi, H. Ebrahimpour-Komleh","doi":"10.1109/ICCKE50421.2020.9303672","DOIUrl":null,"url":null,"abstract":"In recent years, the rapid increase of Persian electronic resources and facility of access to them has seriously triggered the plagiarism problem of the Iranian scientific community. Despite the automatic systems of plagiarism detection, like Turnitin, Eve2, this problem has strongly remained due to lack of support from Persian. The main purpose of this article is to detect exact plagiarisms and re-writings in Persian science texts. In our proposed method, after the candidate retrieval based on the statistical characteristics, in the text alignment step, structural analysis and semantic analysis of expression has been performed to detect re-writing plagiarisms. Firstly, data-driven dependency parser has been improved with the help of a deep learning model for Persian language to analyze the structure of the expression, and then the degree of structural similarity of the expression is evaluated through the analysis of the dependency tree. In this paper, our suggestion to examine the semantic similarity of expression is to use the semantic role labeling obtained from the deep learning model presented. The experiments have been performed on the corpus prepared in the AAIC2015 and corpus of the PAN2015 competitions. The results indicate that structural and semantic information improves the performance of the proposed method. ParsiPayesh is available on http://www.parsipayesh.ir.","PeriodicalId":402043,"journal":{"name":"2020 10th International Conference on Computer and Knowledge Engineering (ICCKE)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"ParsiPayesh: Persian Plagiarism Detection based on Semantic and Structural Analysis\",\"authors\":\"S. Lazemi, H. Ebrahimpour-Komleh\",\"doi\":\"10.1109/ICCKE50421.2020.9303672\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, the rapid increase of Persian electronic resources and facility of access to them has seriously triggered the plagiarism problem of the Iranian scientific community. Despite the automatic systems of plagiarism detection, like Turnitin, Eve2, this problem has strongly remained due to lack of support from Persian. The main purpose of this article is to detect exact plagiarisms and re-writings in Persian science texts. In our proposed method, after the candidate retrieval based on the statistical characteristics, in the text alignment step, structural analysis and semantic analysis of expression has been performed to detect re-writing plagiarisms. Firstly, data-driven dependency parser has been improved with the help of a deep learning model for Persian language to analyze the structure of the expression, and then the degree of structural similarity of the expression is evaluated through the analysis of the dependency tree. In this paper, our suggestion to examine the semantic similarity of expression is to use the semantic role labeling obtained from the deep learning model presented. The experiments have been performed on the corpus prepared in the AAIC2015 and corpus of the PAN2015 competitions. The results indicate that structural and semantic information improves the performance of the proposed method. ParsiPayesh is available on http://www.parsipayesh.ir.\",\"PeriodicalId\":402043,\"journal\":{\"name\":\"2020 10th International Conference on Computer and Knowledge Engineering (ICCKE)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 10th International Conference on Computer and Knowledge Engineering (ICCKE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCKE50421.2020.9303672\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 10th International Conference on Computer and Knowledge Engineering (ICCKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCKE50421.2020.9303672","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
ParsiPayesh: Persian Plagiarism Detection based on Semantic and Structural Analysis
In recent years, the rapid increase of Persian electronic resources and facility of access to them has seriously triggered the plagiarism problem of the Iranian scientific community. Despite the automatic systems of plagiarism detection, like Turnitin, Eve2, this problem has strongly remained due to lack of support from Persian. The main purpose of this article is to detect exact plagiarisms and re-writings in Persian science texts. In our proposed method, after the candidate retrieval based on the statistical characteristics, in the text alignment step, structural analysis and semantic analysis of expression has been performed to detect re-writing plagiarisms. Firstly, data-driven dependency parser has been improved with the help of a deep learning model for Persian language to analyze the structure of the expression, and then the degree of structural similarity of the expression is evaluated through the analysis of the dependency tree. In this paper, our suggestion to examine the semantic similarity of expression is to use the semantic role labeling obtained from the deep learning model presented. The experiments have been performed on the corpus prepared in the AAIC2015 and corpus of the PAN2015 competitions. The results indicate that structural and semantic information improves the performance of the proposed method. ParsiPayesh is available on http://www.parsipayesh.ir.