Intrinsic Plagiarism Detection System Using Stylometric Features and DBSCAN

Anu Saini, Manepalli Ratna Sri, Mansi Thakur
{"title":"Intrinsic Plagiarism Detection System Using Stylometric Features and DBSCAN","authors":"Anu Saini, Manepalli Ratna Sri, Mansi Thakur","doi":"10.1109/ICCCIS51004.2021.9397187","DOIUrl":null,"url":null,"abstract":"Plagiarism is the act of using someone else’s words or ideas without giving them due credit and representing it as one’s own work. In today's world, it is very easy to plagiarize others' work due to advancement in technology, especially by the use of the Internet or other offline sources such as books or magazines. Plagiarism can be classified into two broad categories on the basis of detection namely extrinsic and intrinsic plagiarism. Extrinsic plagiarism detection refers to detecting plagiarism in a document by comparing it against a given reference dataset, whereas, Intrinsic plagiarism detection refers to detecting plagiarism with the help of variation in writing styles without using any reference corpus. Although there are many approaches which can be adopted to detect extrinsic plagiarism, few are available for intrinsic plagiarism detection. In this paper, a simplified approach is proposed for developing an intrinsic plagiarism detector which is helpful in detecting plagiarism even when no reference corpus is available. The approach deals with development of an intrinsic plagiarism detection system by identifying the writing style of authors in the document using stylometric features and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering. The proposed system has an easy to use interactive interface where user has to upload a text document to be checked for plagiarism and the result is displayed on the web page itself. In addition, the user can also see the analysis of the document in the form of graphs.","PeriodicalId":316752,"journal":{"name":"2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCIS51004.2021.9397187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Plagiarism is the act of using someone else’s words or ideas without giving them due credit and representing it as one’s own work. In today's world, it is very easy to plagiarize others' work due to advancement in technology, especially by the use of the Internet or other offline sources such as books or magazines. Plagiarism can be classified into two broad categories on the basis of detection namely extrinsic and intrinsic plagiarism. Extrinsic plagiarism detection refers to detecting plagiarism in a document by comparing it against a given reference dataset, whereas, Intrinsic plagiarism detection refers to detecting plagiarism with the help of variation in writing styles without using any reference corpus. Although there are many approaches which can be adopted to detect extrinsic plagiarism, few are available for intrinsic plagiarism detection. In this paper, a simplified approach is proposed for developing an intrinsic plagiarism detector which is helpful in detecting plagiarism even when no reference corpus is available. The approach deals with development of an intrinsic plagiarism detection system by identifying the writing style of authors in the document using stylometric features and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering. The proposed system has an easy to use interactive interface where user has to upload a text document to be checked for plagiarism and the result is displayed on the web page itself. In addition, the user can also see the analysis of the document in the form of graphs.
基于文体特征和DBSCAN的内在抄袭检测系统
抄袭是指使用他人的文字或想法而不给予应有的荣誉,并将其作为自己的作品呈现的行为。在当今世界,由于技术的进步,剽窃他人的作品是非常容易的,特别是通过使用互联网或其他离线资源,如书籍或杂志。在检测的基础上,抄袭可以分为两大类,即外在抄袭和内在抄袭。外在抄袭检测是指通过与给定的参考数据集进行比较来检测文档中的抄袭,而内在抄袭检测是指在不使用任何参考语料库的情况下,借助写作风格的变化来检测剽窃。虽然有很多方法可以用来检测外在抄袭,但很少有方法可以用于检测内在抄袭。本文提出了一种简化的方法来开发一种内在抄袭检测器,该检测器可以在没有参考语料库的情况下检测抄袭。该方法通过使用文体特征和基于密度的空间聚类应用噪声(DBSCAN)聚类来识别文档中作者的写作风格,从而开发了一个内在抄袭检测系统。提出的系统有一个易于使用的交互界面,用户必须上传一个文本文件,以检查抄袭和结果显示在网页本身。此外,用户还可以以图形的形式看到对文档的分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信