A. A. P. Ratna, F. A. Ekadiyanto, Mardiyah, Prima Dewi Purnamasari, Muhammad Salman
{"title":"术语-文档矩阵对基于潜在语义分析的跨语言剽窃检测准确性的影响分析","authors":"A. A. P. Ratna, F. A. Ekadiyanto, Mardiyah, Prima Dewi Purnamasari, Muhammad Salman","doi":"10.1145/3033288.3033300","DOIUrl":null,"url":null,"abstract":"This paper presents the results of experimental investigation on the impact of term-document matrix variations to the accuracy of cross-language LSA-based plagiarism detection. The experiment was focusing in comparing Indonesian and English papers. The increase of document definition size as the source of matrix construction significantly caused negative impact to the detection accuracy in all scenarios. The results of the experiments showed that the document definition size must be kept below 10 in order to maintain high accuracy, and reached its worst performance at 25. Additionally, the implementation of term-document matrix using the frequency of word's occurrence was found beneficial to the improvement of detection accuracy compared to the binary implementation using simply the existence/absence of words.","PeriodicalId":253625,"journal":{"name":"International Conference on Network, Communication and Computing","volume":"2005 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Analysis on the Effect of Term-Document's Matrix to the Accuracy of Latent-Semantic-Analysis-Based Cross-Language Plagiarism Detection\",\"authors\":\"A. A. P. Ratna, F. A. Ekadiyanto, Mardiyah, Prima Dewi Purnamasari, Muhammad Salman\",\"doi\":\"10.1145/3033288.3033300\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents the results of experimental investigation on the impact of term-document matrix variations to the accuracy of cross-language LSA-based plagiarism detection. The experiment was focusing in comparing Indonesian and English papers. The increase of document definition size as the source of matrix construction significantly caused negative impact to the detection accuracy in all scenarios. The results of the experiments showed that the document definition size must be kept below 10 in order to maintain high accuracy, and reached its worst performance at 25. Additionally, the implementation of term-document matrix using the frequency of word's occurrence was found beneficial to the improvement of detection accuracy compared to the binary implementation using simply the existence/absence of words.\",\"PeriodicalId\":253625,\"journal\":{\"name\":\"International Conference on Network, Communication and Computing\",\"volume\":\"2005 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Network, Communication and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3033288.3033300\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Network, Communication and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3033288.3033300","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analysis on the Effect of Term-Document's Matrix to the Accuracy of Latent-Semantic-Analysis-Based Cross-Language Plagiarism Detection
This paper presents the results of experimental investigation on the impact of term-document matrix variations to the accuracy of cross-language LSA-based plagiarism detection. The experiment was focusing in comparing Indonesian and English papers. The increase of document definition size as the source of matrix construction significantly caused negative impact to the detection accuracy in all scenarios. The results of the experiments showed that the document definition size must be kept below 10 in order to maintain high accuracy, and reached its worst performance at 25. Additionally, the implementation of term-document matrix using the frequency of word's occurrence was found beneficial to the improvement of detection accuracy compared to the binary implementation using simply the existence/absence of words.