Cross-Language Automatic Plagiarism Detector Using Latent Semantic Analysis and Self-Organizing Map

A. A. P. Ratna, Paskalis Nandana Yestha Nabhastala, Ihsan Ibrahim, F. A. Ekadiyanto, Muhammad Salman, Muhammad Yusuf Irfan Herusaktiawan, Prima Dewi Purnamasari
{"title":"Cross-Language Automatic Plagiarism Detector Using Latent Semantic Analysis and Self-Organizing Map","authors":"A. A. P. Ratna, Paskalis Nandana Yestha Nabhastala, Ihsan Ibrahim, F. A. Ekadiyanto, Muhammad Salman, Muhammad Yusuf Irfan Herusaktiawan, Prima Dewi Purnamasari","doi":"10.1145/3293663.3293681","DOIUrl":null,"url":null,"abstract":"Computer assisted detection or automatic detection for plagiarism could help human to check whether an author of a paper do plagiarism or not. Department of Electrical Engineering, Universitas Indonesia had been developing cross-language automatic plagiarism detection which test paper is written on Indonesian and reference paper written on English. More accurate automatic detection system is needed to prevent plagiarism act, especially on academic paper. The system is based on Latent Semantic Analysis (LSA) algorithm with addition of Self-Organizing Map (SOM) to do classification of the output from LSA. Some features for SOM are extracted from singular value matrix from LSA, they are Frobenius Norm and Cosine Similarity. Together with percentage of technical term, all of the features are used as the input for SOM to classify into 10, 5, 3, and 2 classes. The use of 5 classes in LSA could give equal accuracy for all classes, with the highest accuracy reach 83.09%. While in LSA-SOM, the best accuracy is 83.53% for training data and 80.47% for testing data, in 2-classes configuration with 3 features, they were percentage of technical term, frobenius norm, and pad.","PeriodicalId":420290,"journal":{"name":"International Conference on Artificial Intelligence and Virtual Reality","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Artificial Intelligence and Virtual Reality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3293663.3293681","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Computer assisted detection or automatic detection for plagiarism could help human to check whether an author of a paper do plagiarism or not. Department of Electrical Engineering, Universitas Indonesia had been developing cross-language automatic plagiarism detection which test paper is written on Indonesian and reference paper written on English. More accurate automatic detection system is needed to prevent plagiarism act, especially on academic paper. The system is based on Latent Semantic Analysis (LSA) algorithm with addition of Self-Organizing Map (SOM) to do classification of the output from LSA. Some features for SOM are extracted from singular value matrix from LSA, they are Frobenius Norm and Cosine Similarity. Together with percentage of technical term, all of the features are used as the input for SOM to classify into 10, 5, 3, and 2 classes. The use of 5 classes in LSA could give equal accuracy for all classes, with the highest accuracy reach 83.09%. While in LSA-SOM, the best accuracy is 83.53% for training data and 80.47% for testing data, in 2-classes configuration with 3 features, they were percentage of technical term, frobenius norm, and pad.
基于潜在语义分析和自组织映射的跨语言自动抄袭检测
计算机辅助检测或自动检测可以帮助人们检查论文的作者是否有抄袭行为。印度尼西亚大学电气工程系一直在开发跨语言自动抄袭检测,测试试卷用印尼语写,参考论文用英语写。需要更精确的自动检测系统来防止抄袭行为,特别是在学术论文上。该系统以潜在语义分析(LSA)算法为基础,加入自组织映射(SOM)对LSA的输出进行分类。从LSA的奇异值矩阵中提取SOM的特征,即Frobenius范数和余弦相似度。与技术术语百分比一起,将所有特征作为SOM的输入,将其分为10、5、3和2类。在LSA中使用5个类可以使所有类的准确率相等,最高准确率达到83.09%。而在LSA-SOM中,训练数据的准确率为83.53%,测试数据的准确率为80.47%,在2类配置中,3个特征分别是技术术语百分比、frobenius范数和pad。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信