{"title":"利用CNN检测波斯语剽窃","authors":"S. Lazemi, H. Ebrahimpour-Komleh, N. Noroozi","doi":"10.1109/ICCKE.2018.8566340","DOIUrl":null,"url":null,"abstract":"The abundant and growing amount of scientific-research works and the ease of access to them has caused some abusive exploits from jobber people and illicit use of them in scientific and academic environments. “Plagiarism” refers to the use of scientific-research works by others without reference to them correctly. Due to the rapid growth of Persian electronic resources, this paper considers the plagiarism detection in Persian texts. Plagiarism detection consists of two distinct steps: Candidate Retrieval and Text Alignment. The focus of our proposed method is on both steps. In the first step, using a Convolutional Neural Network (CNN), a vector representation is created in document-level and then, the candidate documents are retrieved using the k-means clustering algorithm. In order to align text, the features are extracted at the sentence-level using a CNN. Finally, using the classification algorithms, the copied sentences are detected. Experiments were performed on the prepared corpus in the AAI competition and the prepared corpus in the PAN2015 competition. The achieved precision and recall are 0.843 and 0.806 for the first corpus and 0.833 and 0.826 for the second one respectively.","PeriodicalId":283700,"journal":{"name":"2018 8th International Conference on Computer and Knowledge Engineering (ICCKE)","volume":"202 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Persian Plagirisim Detection Using CNN s\",\"authors\":\"S. Lazemi, H. Ebrahimpour-Komleh, N. Noroozi\",\"doi\":\"10.1109/ICCKE.2018.8566340\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The abundant and growing amount of scientific-research works and the ease of access to them has caused some abusive exploits from jobber people and illicit use of them in scientific and academic environments. “Plagiarism” refers to the use of scientific-research works by others without reference to them correctly. Due to the rapid growth of Persian electronic resources, this paper considers the plagiarism detection in Persian texts. Plagiarism detection consists of two distinct steps: Candidate Retrieval and Text Alignment. The focus of our proposed method is on both steps. In the first step, using a Convolutional Neural Network (CNN), a vector representation is created in document-level and then, the candidate documents are retrieved using the k-means clustering algorithm. In order to align text, the features are extracted at the sentence-level using a CNN. Finally, using the classification algorithms, the copied sentences are detected. Experiments were performed on the prepared corpus in the AAI competition and the prepared corpus in the PAN2015 competition. The achieved precision and recall are 0.843 and 0.806 for the first corpus and 0.833 and 0.826 for the second one respectively.\",\"PeriodicalId\":283700,\"journal\":{\"name\":\"2018 8th International Conference on Computer and Knowledge Engineering (ICCKE)\",\"volume\":\"202 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 8th International Conference on Computer and Knowledge Engineering (ICCKE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCKE.2018.8566340\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 8th International Conference on Computer and Knowledge Engineering (ICCKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCKE.2018.8566340","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The abundant and growing amount of scientific-research works and the ease of access to them has caused some abusive exploits from jobber people and illicit use of them in scientific and academic environments. “Plagiarism” refers to the use of scientific-research works by others without reference to them correctly. Due to the rapid growth of Persian electronic resources, this paper considers the plagiarism detection in Persian texts. Plagiarism detection consists of two distinct steps: Candidate Retrieval and Text Alignment. The focus of our proposed method is on both steps. In the first step, using a Convolutional Neural Network (CNN), a vector representation is created in document-level and then, the candidate documents are retrieved using the k-means clustering algorithm. In order to align text, the features are extracted at the sentence-level using a CNN. Finally, using the classification algorithms, the copied sentences are detected. Experiments were performed on the prepared corpus in the AAI competition and the prepared corpus in the PAN2015 competition. The achieved precision and recall are 0.843 and 0.806 for the first corpus and 0.833 and 0.826 for the second one respectively.