基于多特征的跨语言剽窃检测模型

2021 IEEE Symposium on Computers and Communications (ISCC) Pub Date : 2021-09-05 DOI:10.1109/ISCC53001.2021.9631406

Gang Liu, Yichao Dong, Guang Li

{"title":"基于多特征的跨语言剽窃检测模型","authors":"Gang Liu, Yichao Dong, Guang Li","doi":"10.1109/ISCC53001.2021.9631406","DOIUrl":null,"url":null,"abstract":"As information sharing becomes more and more convenient, a lot of phenomena of plagiarism shows up. The study of cross-language plagiarism is an important problem that the whole academic circle tries to solve it collectively. In this paper, a multiple-features based cross-language plagiarism detection model is proposed, which includes cross-language plagiarism candidate retrieval based on multiple features and cross-language plagiarism detection based on dynamic text alignment. For cross-language plagiarism candidate retrieval, it is mainly based on the translation features. What's more, for cross-language plagiarism detection, a text-alignment based similarity analysis was used to filter the final results between the identified paragraphs. In this step, our approach doesn't use a machine translation system to convert longer text, but uses a dictionary to obtain the translation of a single word. Moreover, experimental results show that our method outperforms the previous methods and achieved the best results in four datasets.","PeriodicalId":270786,"journal":{"name":"2021 IEEE Symposium on Computers and Communications (ISCC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cross-Language Plagiarism Detection Model Based On Multiple Features\",\"authors\":\"Gang Liu, Yichao Dong, Guang Li\",\"doi\":\"10.1109/ISCC53001.2021.9631406\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As information sharing becomes more and more convenient, a lot of phenomena of plagiarism shows up. The study of cross-language plagiarism is an important problem that the whole academic circle tries to solve it collectively. In this paper, a multiple-features based cross-language plagiarism detection model is proposed, which includes cross-language plagiarism candidate retrieval based on multiple features and cross-language plagiarism detection based on dynamic text alignment. For cross-language plagiarism candidate retrieval, it is mainly based on the translation features. What's more, for cross-language plagiarism detection, a text-alignment based similarity analysis was used to filter the final results between the identified paragraphs. In this step, our approach doesn't use a machine translation system to convert longer text, but uses a dictionary to obtain the translation of a single word. Moreover, experimental results show that our method outperforms the previous methods and achieved the best results in four datasets.\",\"PeriodicalId\":270786,\"journal\":{\"name\":\"2021 IEEE Symposium on Computers and Communications (ISCC)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Symposium on Computers and Communications (ISCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCC53001.2021.9631406\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Symposium on Computers and Communications (ISCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCC53001.2021.9631406","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着信息共享的日益便利，剽窃现象层出不穷。跨语言剽窃的研究是整个学术界试图共同解决的一个重要问题。本文提出了一种基于多特征的跨语言抄袭检测模型，包括基于多特征的跨语言抄袭候选检索和基于动态文本对齐的跨语言抄袭检测。对于跨语言的抄袭候选检索，主要是基于翻译特征。此外，对于跨语言剽窃检测，基于文本对齐的相似性分析用于过滤识别段落之间的最终结果。在这一步中，我们的方法不使用机器翻译系统来转换较长的文本，而是使用字典来获得单个单词的翻译。此外，实验结果表明，我们的方法优于以往的方法，并在四个数据集上取得了最好的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cross-Language Plagiarism Detection Model Based On Multiple Features

As information sharing becomes more and more convenient, a lot of phenomena of plagiarism shows up. The study of cross-language plagiarism is an important problem that the whole academic circle tries to solve it collectively. In this paper, a multiple-features based cross-language plagiarism detection model is proposed, which includes cross-language plagiarism candidate retrieval based on multiple features and cross-language plagiarism detection based on dynamic text alignment. For cross-language plagiarism candidate retrieval, it is mainly based on the translation features. What's more, for cross-language plagiarism detection, a text-alignment based similarity analysis was used to filter the final results between the identified paragraphs. In this step, our approach doesn't use a machine translation system to convert longer text, but uses a dictionary to obtain the translation of a single word. Moreover, experimental results show that our method outperforms the previous methods and achieved the best results in four datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE Symposium on Computers and Communications (ISCC)

自引率

0.00%

发文量