{"title":"基于改进Rabin-Karp算法的最佳K-Gram值的选择","authors":"Wahyu Hidayat, Ema Utami, A. Sunyoto","doi":"10.22146/ijccs.63686","DOIUrl":null,"url":null,"abstract":"The Rabin-Karp algorithm is used to detect similarity using hashing techniques, from related studies modifications have been made in the hashing process but in previous studies have not been conducted research for the best k value in the K-Gram process. At the stage of stemming the Nazief & Adriani algorithm is used to transform the words into basic words. The researcher uses several variations of K-Gram values to determine the best K-Gram values. The analysis was performed using Ukara Enhanced public data obtained from the Kaggle with a total of 12215 data. The student essay answers data totaled to 258 data in the group A and 305 in the group B, every student essay answers data in each group will be compared with the answers of other fellow group member. Research results are the value of k = 3 has the best performance which has the highest some interpretations of 1-14% (Little degree of similarity) and 15-50% (Medium level of similarity) compared to values of k = 5, 7, and 9 which have the highest number of interpretation results 0%-0.99% (Document is different). However, if the students essay answers compared have 100% (Exactly the same) interpretations, the k value on K-Gram does not affect the results.","PeriodicalId":31625,"journal":{"name":"IJCCS Indonesian Journal of Computing and Cybernetics Systems","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Selection of the Best K-Gram Value on Modified Rabin-Karp Algorithm\",\"authors\":\"Wahyu Hidayat, Ema Utami, A. Sunyoto\",\"doi\":\"10.22146/ijccs.63686\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Rabin-Karp algorithm is used to detect similarity using hashing techniques, from related studies modifications have been made in the hashing process but in previous studies have not been conducted research for the best k value in the K-Gram process. At the stage of stemming the Nazief & Adriani algorithm is used to transform the words into basic words. The researcher uses several variations of K-Gram values to determine the best K-Gram values. The analysis was performed using Ukara Enhanced public data obtained from the Kaggle with a total of 12215 data. The student essay answers data totaled to 258 data in the group A and 305 in the group B, every student essay answers data in each group will be compared with the answers of other fellow group member. Research results are the value of k = 3 has the best performance which has the highest some interpretations of 1-14% (Little degree of similarity) and 15-50% (Medium level of similarity) compared to values of k = 5, 7, and 9 which have the highest number of interpretation results 0%-0.99% (Document is different). However, if the students essay answers compared have 100% (Exactly the same) interpretations, the k value on K-Gram does not affect the results.\",\"PeriodicalId\":31625,\"journal\":{\"name\":\"IJCCS Indonesian Journal of Computing and Cybernetics Systems\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IJCCS Indonesian Journal of Computing and Cybernetics Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22146/ijccs.63686\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IJCCS Indonesian Journal of Computing and Cybernetics Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22146/ijccs.63686","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Selection of the Best K-Gram Value on Modified Rabin-Karp Algorithm
The Rabin-Karp algorithm is used to detect similarity using hashing techniques, from related studies modifications have been made in the hashing process but in previous studies have not been conducted research for the best k value in the K-Gram process. At the stage of stemming the Nazief & Adriani algorithm is used to transform the words into basic words. The researcher uses several variations of K-Gram values to determine the best K-Gram values. The analysis was performed using Ukara Enhanced public data obtained from the Kaggle with a total of 12215 data. The student essay answers data totaled to 258 data in the group A and 305 in the group B, every student essay answers data in each group will be compared with the answers of other fellow group member. Research results are the value of k = 3 has the best performance which has the highest some interpretations of 1-14% (Little degree of similarity) and 15-50% (Medium level of similarity) compared to values of k = 5, 7, and 9 which have the highest number of interpretation results 0%-0.99% (Document is different). However, if the students essay answers compared have 100% (Exactly the same) interpretations, the k value on K-Gram does not affect the results.