用于抄袭检测的文本挖掘:用于文本相似度识别的多元模式检测

Konstantinos F. Xylogiannopoulos, P. Karampelas, R. Alhajj
{"title":"用于抄袭检测的文本挖掘:用于文本相似度识别的多元模式检测","authors":"Konstantinos F. Xylogiannopoulos, P. Karampelas, R. Alhajj","doi":"10.1109/ASONAM.2018.8508265","DOIUrl":null,"url":null,"abstract":"The problem of plagiarism the recent years has been intensified by the availability of information in digital form and the accessibility of the electronic libraries through the Internet. As a result, plagiarism detection has been transformed into a big data analytics problem since the number of digital sources is extravagant and a new document needs to be compared with millions of other existing documents. In this paper, a text mining methodology is proposed that can detect all common patterns between a document and the documents in a reference database. The technique is based on a pattern detection algorithm and the corresponding data structure that enables the algorithm to detect all common patterns. The methodology has been applied in a well-defined dataset providing very promising results identifying difficult cases of plagiarism such as technical disguise.","PeriodicalId":135949,"journal":{"name":"2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Text Mining for Plagiarism Detection: Multivariate Pattern Detection for Recognition of Text Similarities\",\"authors\":\"Konstantinos F. Xylogiannopoulos, P. Karampelas, R. Alhajj\",\"doi\":\"10.1109/ASONAM.2018.8508265\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The problem of plagiarism the recent years has been intensified by the availability of information in digital form and the accessibility of the electronic libraries through the Internet. As a result, plagiarism detection has been transformed into a big data analytics problem since the number of digital sources is extravagant and a new document needs to be compared with millions of other existing documents. In this paper, a text mining methodology is proposed that can detect all common patterns between a document and the documents in a reference database. The technique is based on a pattern detection algorithm and the corresponding data structure that enables the algorithm to detect all common patterns. The methodology has been applied in a well-defined dataset providing very promising results identifying difficult cases of plagiarism such as technical disguise.\",\"PeriodicalId\":135949,\"journal\":{\"name\":\"2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASONAM.2018.8508265\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASONAM.2018.8508265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

近年来,由于数字形式的信息的可获得性和电子图书馆通过互联网的可访问性,剽窃问题愈演愈烈。因此,由于数字来源的数量非常庞大,并且需要将新文档与数百万其他现有文档进行比较,因此抄袭检测已经转变为一个大数据分析问题。本文提出了一种文本挖掘方法,该方法可以检测文档与参考数据库中文档之间的所有公共模式。该技术基于模式检测算法和相应的数据结构,该数据结构使算法能够检测所有通用模式。该方法已应用于一个定义良好的数据集,提供了非常有希望的结果,用于识别诸如技术伪装等困难的剽窃案例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Text Mining for Plagiarism Detection: Multivariate Pattern Detection for Recognition of Text Similarities
The problem of plagiarism the recent years has been intensified by the availability of information in digital form and the accessibility of the electronic libraries through the Internet. As a result, plagiarism detection has been transformed into a big data analytics problem since the number of digital sources is extravagant and a new document needs to be compared with millions of other existing documents. In this paper, a text mining methodology is proposed that can detect all common patterns between a document and the documents in a reference database. The technique is based on a pattern detection algorithm and the corresponding data structure that enables the algorithm to detect all common patterns. The methodology has been applied in a well-defined dataset providing very promising results identifying difficult cases of plagiarism such as technical disguise.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信