{"title":"Automatic Essay Scoring Using Data Augmentation in Bahasa Indonesia","authors":"Nurul Fadilah, Sigit Priyanta","doi":"10.22146/ijccs.76396","DOIUrl":null,"url":null,"abstract":"Essay is one of the assessments to find out the abilities of students in depth. UKARA is an automatic essay scoring development that combines NLP and machine learning. This study uses the datasets provided for the UKARA challenge which consists of 2 types, datasets A and B. The dataset provided is still small for the model creation process so that it is one of the causes of the resulting model is not optimal. This research focuses on the process of adding or augmenting data using EDA (Easy Data Augmentation Techniques). There are four methods applied, namely Synonym Replacement (SR), Random Insertion (RI), Random Swab (RS), and Random Deletion (RD). The data is used for model creation by using the BiLSTM method. Performa model evaluated using confusion matrix with nilai accyouracy, precision, recall dan f-measure.The results showed that the dataset A without augmentation using k-fold cross validation produced the highest accuracy value with a value of 85.07%. While the results in data B show EDA insert with k-fold cross validation of 72.78%.","PeriodicalId":31625,"journal":{"name":"IJCCS Indonesian Journal of Computing and Cybernetics Systems","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IJCCS Indonesian Journal of Computing and Cybernetics Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22146/ijccs.76396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Essay is one of the assessments to find out the abilities of students in depth. UKARA is an automatic essay scoring development that combines NLP and machine learning. This study uses the datasets provided for the UKARA challenge which consists of 2 types, datasets A and B. The dataset provided is still small for the model creation process so that it is one of the causes of the resulting model is not optimal. This research focuses on the process of adding or augmenting data using EDA (Easy Data Augmentation Techniques). There are four methods applied, namely Synonym Replacement (SR), Random Insertion (RI), Random Swab (RS), and Random Deletion (RD). The data is used for model creation by using the BiLSTM method. Performa model evaluated using confusion matrix with nilai accyouracy, precision, recall dan f-measure.The results showed that the dataset A without augmentation using k-fold cross validation produced the highest accuracy value with a value of 85.07%. While the results in data B show EDA insert with k-fold cross validation of 72.78%.