{"title":"Automated Plagiarism Detection Model Based On Deep Siamese Network","authors":"Jing Zhang, Siyuan Xue, Jierui Li, Jian She","doi":"10.1109/ccis57298.2022.10016354","DOIUrl":null,"url":null,"abstract":"This paper presents a novel deep Siamese network for automatic plagiarism detection. Our model utilizes a large-scale pre-trained model BERT (bidirectional encoder representations from transformers) to represent the text as word vector, and uses Bi-LSTM (bidirectional long short-term memory) net works to obtain the contextual semantic features of the text, and designs a text semantic interaction me chanism to obtain the interactive semantic features. Our model uses Siamese network to uniformly map matched text pairs into the same parameter matrix s pace. Meanwhile, our model uses multi-head self-attention to fuse text pair vectors for accurate semantic alignment and similarity measures. The experiment al results show that the effect of this model can identify and detect plagiarized text.","PeriodicalId":374660,"journal":{"name":"2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ccis57298.2022.10016354","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents a novel deep Siamese network for automatic plagiarism detection. Our model utilizes a large-scale pre-trained model BERT (bidirectional encoder representations from transformers) to represent the text as word vector, and uses Bi-LSTM (bidirectional long short-term memory) net works to obtain the contextual semantic features of the text, and designs a text semantic interaction me chanism to obtain the interactive semantic features. Our model uses Siamese network to uniformly map matched text pairs into the same parameter matrix s pace. Meanwhile, our model uses multi-head self-attention to fuse text pair vectors for accurate semantic alignment and similarity measures. The experiment al results show that the effect of this model can identify and detect plagiarized text.