A. A. P. Ratna, Paskalis Nandana Yestha Nabhastala, Ihsan Ibrahim, F. A. Ekadiyanto, Muhammad Salman, Muhammad Yusuf Irfan Herusaktiawan, Prima Dewi Purnamasari
{"title":"基于潜在语义分析和自组织映射的跨语言自动抄袭检测","authors":"A. A. P. Ratna, Paskalis Nandana Yestha Nabhastala, Ihsan Ibrahim, F. A. Ekadiyanto, Muhammad Salman, Muhammad Yusuf Irfan Herusaktiawan, Prima Dewi Purnamasari","doi":"10.1145/3293663.3293681","DOIUrl":null,"url":null,"abstract":"Computer assisted detection or automatic detection for plagiarism could help human to check whether an author of a paper do plagiarism or not. Department of Electrical Engineering, Universitas Indonesia had been developing cross-language automatic plagiarism detection which test paper is written on Indonesian and reference paper written on English. More accurate automatic detection system is needed to prevent plagiarism act, especially on academic paper. The system is based on Latent Semantic Analysis (LSA) algorithm with addition of Self-Organizing Map (SOM) to do classification of the output from LSA. Some features for SOM are extracted from singular value matrix from LSA, they are Frobenius Norm and Cosine Similarity. Together with percentage of technical term, all of the features are used as the input for SOM to classify into 10, 5, 3, and 2 classes. The use of 5 classes in LSA could give equal accuracy for all classes, with the highest accuracy reach 83.09%. While in LSA-SOM, the best accuracy is 83.53% for training data and 80.47% for testing data, in 2-classes configuration with 3 features, they were percentage of technical term, frobenius norm, and pad.","PeriodicalId":420290,"journal":{"name":"International Conference on Artificial Intelligence and Virtual Reality","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Cross-Language Automatic Plagiarism Detector Using Latent Semantic Analysis and Self-Organizing Map\",\"authors\":\"A. A. P. Ratna, Paskalis Nandana Yestha Nabhastala, Ihsan Ibrahim, F. A. Ekadiyanto, Muhammad Salman, Muhammad Yusuf Irfan Herusaktiawan, Prima Dewi Purnamasari\",\"doi\":\"10.1145/3293663.3293681\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Computer assisted detection or automatic detection for plagiarism could help human to check whether an author of a paper do plagiarism or not. Department of Electrical Engineering, Universitas Indonesia had been developing cross-language automatic plagiarism detection which test paper is written on Indonesian and reference paper written on English. More accurate automatic detection system is needed to prevent plagiarism act, especially on academic paper. The system is based on Latent Semantic Analysis (LSA) algorithm with addition of Self-Organizing Map (SOM) to do classification of the output from LSA. Some features for SOM are extracted from singular value matrix from LSA, they are Frobenius Norm and Cosine Similarity. Together with percentage of technical term, all of the features are used as the input for SOM to classify into 10, 5, 3, and 2 classes. The use of 5 classes in LSA could give equal accuracy for all classes, with the highest accuracy reach 83.09%. While in LSA-SOM, the best accuracy is 83.53% for training data and 80.47% for testing data, in 2-classes configuration with 3 features, they were percentage of technical term, frobenius norm, and pad.\",\"PeriodicalId\":420290,\"journal\":{\"name\":\"International Conference on Artificial Intelligence and Virtual Reality\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Artificial Intelligence and Virtual Reality\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3293663.3293681\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Artificial Intelligence and Virtual Reality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3293663.3293681","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Cross-Language Automatic Plagiarism Detector Using Latent Semantic Analysis and Self-Organizing Map
Computer assisted detection or automatic detection for plagiarism could help human to check whether an author of a paper do plagiarism or not. Department of Electrical Engineering, Universitas Indonesia had been developing cross-language automatic plagiarism detection which test paper is written on Indonesian and reference paper written on English. More accurate automatic detection system is needed to prevent plagiarism act, especially on academic paper. The system is based on Latent Semantic Analysis (LSA) algorithm with addition of Self-Organizing Map (SOM) to do classification of the output from LSA. Some features for SOM are extracted from singular value matrix from LSA, they are Frobenius Norm and Cosine Similarity. Together with percentage of technical term, all of the features are used as the input for SOM to classify into 10, 5, 3, and 2 classes. The use of 5 classes in LSA could give equal accuracy for all classes, with the highest accuracy reach 83.09%. While in LSA-SOM, the best accuracy is 83.53% for training data and 80.47% for testing data, in 2-classes configuration with 3 features, they were percentage of technical term, frobenius norm, and pad.