A. A. P. Ratna, F. A. Ekadiyanto, Mardiyah, Prima Dewi Purnamasari, Muhammad Salman
{"title":"Analysis on the Effect of Term-Document's Matrix to the Accuracy of Latent-Semantic-Analysis-Based Cross-Language Plagiarism Detection","authors":"A. A. P. Ratna, F. A. Ekadiyanto, Mardiyah, Prima Dewi Purnamasari, Muhammad Salman","doi":"10.1145/3033288.3033300","DOIUrl":null,"url":null,"abstract":"This paper presents the results of experimental investigation on the impact of term-document matrix variations to the accuracy of cross-language LSA-based plagiarism detection. The experiment was focusing in comparing Indonesian and English papers. The increase of document definition size as the source of matrix construction significantly caused negative impact to the detection accuracy in all scenarios. The results of the experiments showed that the document definition size must be kept below 10 in order to maintain high accuracy, and reached its worst performance at 25. Additionally, the implementation of term-document matrix using the frequency of word's occurrence was found beneficial to the improvement of detection accuracy compared to the binary implementation using simply the existence/absence of words.","PeriodicalId":253625,"journal":{"name":"International Conference on Network, Communication and Computing","volume":"2005 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Network, Communication and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3033288.3033300","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
This paper presents the results of experimental investigation on the impact of term-document matrix variations to the accuracy of cross-language LSA-based plagiarism detection. The experiment was focusing in comparing Indonesian and English papers. The increase of document definition size as the source of matrix construction significantly caused negative impact to the detection accuracy in all scenarios. The results of the experiments showed that the document definition size must be kept below 10 in order to maintain high accuracy, and reached its worst performance at 25. Additionally, the implementation of term-document matrix using the frequency of word's occurrence was found beneficial to the improvement of detection accuracy compared to the binary implementation using simply the existence/absence of words.