用于抄袭检测的文本挖掘:用于文本相似度识别的多元模式检测

2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) Pub Date : 2018-08-01 DOI:10.1109/ASONAM.2018.8508265

Konstantinos F. Xylogiannopoulos, P. Karampelas, R. Alhajj

{"title":"用于抄袭检测的文本挖掘:用于文本相似度识别的多元模式检测","authors":"Konstantinos F. Xylogiannopoulos, P. Karampelas, R. Alhajj","doi":"10.1109/ASONAM.2018.8508265","DOIUrl":null,"url":null,"abstract":"The problem of plagiarism the recent years has been intensified by the availability of information in digital form and the accessibility of the electronic libraries through the Internet. As a result, plagiarism detection has been transformed into a big data analytics problem since the number of digital sources is extravagant and a new document needs to be compared with millions of other existing documents. In this paper, a text mining methodology is proposed that can detect all common patterns between a document and the documents in a reference database. The technique is based on a pattern detection algorithm and the corresponding data structure that enables the algorithm to detect all common patterns. The methodology has been applied in a well-defined dataset providing very promising results identifying difficult cases of plagiarism such as technical disguise.","PeriodicalId":135949,"journal":{"name":"2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Text Mining for Plagiarism Detection: Multivariate Pattern Detection for Recognition of Text Similarities\",\"authors\":\"Konstantinos F. Xylogiannopoulos, P. Karampelas, R. Alhajj\",\"doi\":\"10.1109/ASONAM.2018.8508265\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The problem of plagiarism the recent years has been intensified by the availability of information in digital form and the accessibility of the electronic libraries through the Internet. As a result, plagiarism detection has been transformed into a big data analytics problem since the number of digital sources is extravagant and a new document needs to be compared with millions of other existing documents. In this paper, a text mining methodology is proposed that can detect all common patterns between a document and the documents in a reference database. The technique is based on a pattern detection algorithm and the corresponding data structure that enables the algorithm to detect all common patterns. The methodology has been applied in a well-defined dataset providing very promising results identifying difficult cases of plagiarism such as technical disguise.\",\"PeriodicalId\":135949,\"journal\":{\"name\":\"2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASONAM.2018.8508265\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASONAM.2018.8508265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

近年来，由于数字形式的信息的可获得性和电子图书馆通过互联网的可访问性，剽窃问题愈演愈烈。因此，由于数字来源的数量非常庞大，并且需要将新文档与数百万其他现有文档进行比较，因此抄袭检测已经转变为一个大数据分析问题。本文提出了一种文本挖掘方法，该方法可以检测文档与参考数据库中文档之间的所有公共模式。该技术基于模式检测算法和相应的数据结构，该数据结构使算法能够检测所有通用模式。该方法已应用于一个定义良好的数据集，提供了非常有希望的结果，用于识别诸如技术伪装等困难的剽窃案例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Text Mining for Plagiarism Detection: Multivariate Pattern Detection for Recognition of Text Similarities

The problem of plagiarism the recent years has been intensified by the availability of information in digital form and the accessibility of the electronic libraries through the Internet. As a result, plagiarism detection has been transformed into a big data analytics problem since the number of digital sources is extravagant and a new document needs to be compared with millions of other existing documents. In this paper, a text mining methodology is proposed that can detect all common patterns between a document and the documents in a reference database. The technique is based on a pattern detection algorithm and the corresponding data structure that enables the algorithm to detect all common patterns. The methodology has been applied in a well-defined dataset providing very promising results identifying difficult cases of plagiarism such as technical disguise.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)

自引率

0.00%

发文量