Enhancing automatic plagiarism detection: Using Doc2vec model

Imene Setha, H. Aliane
{"title":"Enhancing automatic plagiarism detection: Using Doc2vec model","authors":"Imene Setha, H. Aliane","doi":"10.1109/ICAASE56196.2022.9931542","DOIUrl":null,"url":null,"abstract":"Academic institutions define plagiarism as an act of cheating and stealing other’s ideas to pass as their own. Therefore, a huge interest is conducted into plagiarism detection field u sing m ultiple t echniques. I nt his a rticle, wep ropose a method to automatically detect different types of plagiarism from two languages. This method is based on sentence modelling to try to extract plagiarized parts from documents using Doc2Vec model which predicts semantic similarity between documents and phrases.We use the PAN corpus for English plagiarism detection and AraPlagDet for Arabic. Both PAN and AraPlagDet corporas provide a set of suspicious documents that are manually and artificially plagiarized along with their sources.","PeriodicalId":206411,"journal":{"name":"2022 International Conference on Advanced Aspects of Software Engineering (ICAASE)","volume":"118 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Advanced Aspects of Software Engineering (ICAASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAASE56196.2022.9931542","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Academic institutions define plagiarism as an act of cheating and stealing other’s ideas to pass as their own. Therefore, a huge interest is conducted into plagiarism detection field u sing m ultiple t echniques. I nt his a rticle, wep ropose a method to automatically detect different types of plagiarism from two languages. This method is based on sentence modelling to try to extract plagiarized parts from documents using Doc2Vec model which predicts semantic similarity between documents and phrases.We use the PAN corpus for English plagiarism detection and AraPlagDet for Arabic. Both PAN and AraPlagDet corporas provide a set of suspicious documents that are manually and artificially plagiarized along with their sources.
增强自动抄袭检测:使用Doc2vec模型
学术机构将剽窃定义为欺骗行为,窃取他人的想法冒充自己的想法。因此,利用多种技术对抄袭检测领域产生了巨大的兴趣。在这篇文章中,我们提出了一种方法来自动检测两种语言的不同类型的剽窃。该方法基于句子建模,利用Doc2Vec模型预测文档和短语之间的语义相似度,尝试从文档中提取剽窃部分。我们使用PAN语料库进行英语抄袭检测,使用AraPlagDet进行阿拉伯语抄袭检测。PAN和AraPlagDet语料库都提供了一组可疑文件,这些文件连同其来源一起被人工或人为地抄袭。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信