{"title":"增强自动抄袭检测:使用Doc2vec模型","authors":"Imene Setha, H. Aliane","doi":"10.1109/ICAASE56196.2022.9931542","DOIUrl":null,"url":null,"abstract":"Academic institutions define plagiarism as an act of cheating and stealing other’s ideas to pass as their own. Therefore, a huge interest is conducted into plagiarism detection field u sing m ultiple t echniques. I nt his a rticle, wep ropose a method to automatically detect different types of plagiarism from two languages. This method is based on sentence modelling to try to extract plagiarized parts from documents using Doc2Vec model which predicts semantic similarity between documents and phrases.We use the PAN corpus for English plagiarism detection and AraPlagDet for Arabic. Both PAN and AraPlagDet corporas provide a set of suspicious documents that are manually and artificially plagiarized along with their sources.","PeriodicalId":206411,"journal":{"name":"2022 International Conference on Advanced Aspects of Software Engineering (ICAASE)","volume":"118 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing automatic plagiarism detection: Using Doc2vec model\",\"authors\":\"Imene Setha, H. Aliane\",\"doi\":\"10.1109/ICAASE56196.2022.9931542\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Academic institutions define plagiarism as an act of cheating and stealing other’s ideas to pass as their own. Therefore, a huge interest is conducted into plagiarism detection field u sing m ultiple t echniques. I nt his a rticle, wep ropose a method to automatically detect different types of plagiarism from two languages. This method is based on sentence modelling to try to extract plagiarized parts from documents using Doc2Vec model which predicts semantic similarity between documents and phrases.We use the PAN corpus for English plagiarism detection and AraPlagDet for Arabic. Both PAN and AraPlagDet corporas provide a set of suspicious documents that are manually and artificially plagiarized along with their sources.\",\"PeriodicalId\":206411,\"journal\":{\"name\":\"2022 International Conference on Advanced Aspects of Software Engineering (ICAASE)\",\"volume\":\"118 1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Advanced Aspects of Software Engineering (ICAASE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAASE56196.2022.9931542\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Advanced Aspects of Software Engineering (ICAASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAASE56196.2022.9931542","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Enhancing automatic plagiarism detection: Using Doc2vec model
Academic institutions define plagiarism as an act of cheating and stealing other’s ideas to pass as their own. Therefore, a huge interest is conducted into plagiarism detection field u sing m ultiple t echniques. I nt his a rticle, wep ropose a method to automatically detect different types of plagiarism from two languages. This method is based on sentence modelling to try to extract plagiarized parts from documents using Doc2Vec model which predicts semantic similarity between documents and phrases.We use the PAN corpus for English plagiarism detection and AraPlagDet for Arabic. Both PAN and AraPlagDet corporas provide a set of suspicious documents that are manually and artificially plagiarized along with their sources.