U. Hasanah, Tri Astuti, R. Wahyudi, Zanuar Rifai, Rilas Agung Pambudi
{"title":"印尼语短文自动评分文本预处理技术的实验研究","authors":"U. Hasanah, Tri Astuti, R. Wahyudi, Zanuar Rifai, Rilas Agung Pambudi","doi":"10.1109/ICITISEE.2018.8720957","DOIUrl":null,"url":null,"abstract":"The preprocessing phase in information retrieval is intended to reduce the size of the text. Previous studies have used many preprocessing techniques in several applications such as Clustering, Classification, Document Indexing, Summarization, and Automatic Essay Grading. In this study we aim to conduct an experimental study to measure the effectiveness of preprocessing techniques in Automatic Short Answer Grading (ASAG) using questions and answers in Indonesian. As previously known, Indonesian has a different morphology from English. With the limitations of Indonesian language processing tools, we are working on several processing techniques that can be used, such as Case Folding, Tokenization, Punctuation Removal, Stopword Removal, and Stemming. We use data consisting of 6 questions and each question answered by 32 students. As a reference answer, we will use one teacher’s answer on each question. Technically, we conducted two types of experimental studies. In the first experiment, we carried out two types of pre-processing techniques, namely Punctuation Removal and Tokenization. In the second experiment, we added three other preprocessing techniques, namely Case Folding, Stemming, and Stopword Removal. We measure the similarity values of teacher and student answers using the Cosine Similarity method. Next, we calculated the correlation values and Mean Absolute Error to measure the effectiveness of the preprocessing techniques that have been used. In the end, the results of the paired-samples t-test showed that there were no significant differences in the two experiments.","PeriodicalId":180051,"journal":{"name":"2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"An Experimental Study of Text Preprocessing Techniques for Automatic Short Answer Grading in Indonesian\",\"authors\":\"U. Hasanah, Tri Astuti, R. Wahyudi, Zanuar Rifai, Rilas Agung Pambudi\",\"doi\":\"10.1109/ICITISEE.2018.8720957\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The preprocessing phase in information retrieval is intended to reduce the size of the text. Previous studies have used many preprocessing techniques in several applications such as Clustering, Classification, Document Indexing, Summarization, and Automatic Essay Grading. In this study we aim to conduct an experimental study to measure the effectiveness of preprocessing techniques in Automatic Short Answer Grading (ASAG) using questions and answers in Indonesian. As previously known, Indonesian has a different morphology from English. With the limitations of Indonesian language processing tools, we are working on several processing techniques that can be used, such as Case Folding, Tokenization, Punctuation Removal, Stopword Removal, and Stemming. We use data consisting of 6 questions and each question answered by 32 students. As a reference answer, we will use one teacher’s answer on each question. Technically, we conducted two types of experimental studies. In the first experiment, we carried out two types of pre-processing techniques, namely Punctuation Removal and Tokenization. In the second experiment, we added three other preprocessing techniques, namely Case Folding, Stemming, and Stopword Removal. We measure the similarity values of teacher and student answers using the Cosine Similarity method. Next, we calculated the correlation values and Mean Absolute Error to measure the effectiveness of the preprocessing techniques that have been used. In the end, the results of the paired-samples t-test showed that there were no significant differences in the two experiments.\",\"PeriodicalId\":180051,\"journal\":{\"name\":\"2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICITISEE.2018.8720957\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITISEE.2018.8720957","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Experimental Study of Text Preprocessing Techniques for Automatic Short Answer Grading in Indonesian
The preprocessing phase in information retrieval is intended to reduce the size of the text. Previous studies have used many preprocessing techniques in several applications such as Clustering, Classification, Document Indexing, Summarization, and Automatic Essay Grading. In this study we aim to conduct an experimental study to measure the effectiveness of preprocessing techniques in Automatic Short Answer Grading (ASAG) using questions and answers in Indonesian. As previously known, Indonesian has a different morphology from English. With the limitations of Indonesian language processing tools, we are working on several processing techniques that can be used, such as Case Folding, Tokenization, Punctuation Removal, Stopword Removal, and Stemming. We use data consisting of 6 questions and each question answered by 32 students. As a reference answer, we will use one teacher’s answer on each question. Technically, we conducted two types of experimental studies. In the first experiment, we carried out two types of pre-processing techniques, namely Punctuation Removal and Tokenization. In the second experiment, we added three other preprocessing techniques, namely Case Folding, Stemming, and Stopword Removal. We measure the similarity values of teacher and student answers using the Cosine Similarity method. Next, we calculated the correlation values and Mean Absolute Error to measure the effectiveness of the preprocessing techniques that have been used. In the end, the results of the paired-samples t-test showed that there were no significant differences in the two experiments.