印尼语短文自动评分文本预处理技术的实验研究

2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE) Pub Date : 2018-11-01 DOI:10.1109/ICITISEE.2018.8720957

U. Hasanah, Tri Astuti, R. Wahyudi, Zanuar Rifai, Rilas Agung Pambudi

{"title":"印尼语短文自动评分文本预处理技术的实验研究","authors":"U. Hasanah, Tri Astuti, R. Wahyudi, Zanuar Rifai, Rilas Agung Pambudi","doi":"10.1109/ICITISEE.2018.8720957","DOIUrl":null,"url":null,"abstract":"The preprocessing phase in information retrieval is intended to reduce the size of the text. Previous studies have used many preprocessing techniques in several applications such as Clustering, Classification, Document Indexing, Summarization, and Automatic Essay Grading. In this study we aim to conduct an experimental study to measure the effectiveness of preprocessing techniques in Automatic Short Answer Grading (ASAG) using questions and answers in Indonesian. As previously known, Indonesian has a different morphology from English. With the limitations of Indonesian language processing tools, we are working on several processing techniques that can be used, such as Case Folding, Tokenization, Punctuation Removal, Stopword Removal, and Stemming. We use data consisting of 6 questions and each question answered by 32 students. As a reference answer, we will use one teacher’s answer on each question. Technically, we conducted two types of experimental studies. In the first experiment, we carried out two types of pre-processing techniques, namely Punctuation Removal and Tokenization. In the second experiment, we added three other preprocessing techniques, namely Case Folding, Stemming, and Stopword Removal. We measure the similarity values of teacher and student answers using the Cosine Similarity method. Next, we calculated the correlation values and Mean Absolute Error to measure the effectiveness of the preprocessing techniques that have been used. In the end, the results of the paired-samples t-test showed that there were no significant differences in the two experiments.","PeriodicalId":180051,"journal":{"name":"2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"An Experimental Study of Text Preprocessing Techniques for Automatic Short Answer Grading in Indonesian\",\"authors\":\"U. Hasanah, Tri Astuti, R. Wahyudi, Zanuar Rifai, Rilas Agung Pambudi\",\"doi\":\"10.1109/ICITISEE.2018.8720957\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The preprocessing phase in information retrieval is intended to reduce the size of the text. Previous studies have used many preprocessing techniques in several applications such as Clustering, Classification, Document Indexing, Summarization, and Automatic Essay Grading. In this study we aim to conduct an experimental study to measure the effectiveness of preprocessing techniques in Automatic Short Answer Grading (ASAG) using questions and answers in Indonesian. As previously known, Indonesian has a different morphology from English. With the limitations of Indonesian language processing tools, we are working on several processing techniques that can be used, such as Case Folding, Tokenization, Punctuation Removal, Stopword Removal, and Stemming. We use data consisting of 6 questions and each question answered by 32 students. As a reference answer, we will use one teacher’s answer on each question. Technically, we conducted two types of experimental studies. In the first experiment, we carried out two types of pre-processing techniques, namely Punctuation Removal and Tokenization. In the second experiment, we added three other preprocessing techniques, namely Case Folding, Stemming, and Stopword Removal. We measure the similarity values of teacher and student answers using the Cosine Similarity method. Next, we calculated the correlation values and Mean Absolute Error to measure the effectiveness of the preprocessing techniques that have been used. In the end, the results of the paired-samples t-test showed that there were no significant differences in the two experiments.\",\"PeriodicalId\":180051,\"journal\":{\"name\":\"2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICITISEE.2018.8720957\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITISEE.2018.8720957","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

信息检索中的预处理阶段旨在减小文本的大小。以往的研究在聚类、分类、文献索引、摘要和论文自动评分等应用中使用了许多预处理技术。在这项研究中，我们的目的是进行一项实验研究，以衡量使用印尼语问答的自动简答评分(ASAG)预处理技术的有效性。如前所述，印尼语与英语有不同的词法。由于印度尼西亚语言处理工具的局限性，我们正在研究几种可以使用的处理技术，如Case折叠、Tokenization、标点删除、Stopword删除和词干提取。我们使用的数据由6个问题组成，每个问题由32名学生回答。作为参考答案，我们将在每个问题上使用一位老师的答案。从技术上讲，我们进行了两类实验研究。在第一个实验中，我们进行了两种预处理技术，即标点符号去除和Tokenization。在第二个实验中，我们添加了另外三种预处理技术，即Case折叠、词干提取和停词去除。我们使用余弦相似度法测量教师和学生答案的相似度值。接下来，我们计算相关值和平均绝对误差来衡量所使用的预处理技术的有效性。最后，配对样本t检验的结果显示，两个实验没有显著差异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Experimental Study of Text Preprocessing Techniques for Automatic Short Answer Grading in Indonesian

The preprocessing phase in information retrieval is intended to reduce the size of the text. Previous studies have used many preprocessing techniques in several applications such as Clustering, Classification, Document Indexing, Summarization, and Automatic Essay Grading. In this study we aim to conduct an experimental study to measure the effectiveness of preprocessing techniques in Automatic Short Answer Grading (ASAG) using questions and answers in Indonesian. As previously known, Indonesian has a different morphology from English. With the limitations of Indonesian language processing tools, we are working on several processing techniques that can be used, such as Case Folding, Tokenization, Punctuation Removal, Stopword Removal, and Stemming. We use data consisting of 6 questions and each question answered by 32 students. As a reference answer, we will use one teacher’s answer on each question. Technically, we conducted two types of experimental studies. In the first experiment, we carried out two types of pre-processing techniques, namely Punctuation Removal and Tokenization. In the second experiment, we added three other preprocessing techniques, namely Case Folding, Stemming, and Stopword Removal. We measure the similarity values of teacher and student answers using the Cosine Similarity method. Next, we calculated the correlation values and Mean Absolute Error to measure the effectiveness of the preprocessing techniques that have been used. In the end, the results of the paired-samples t-test showed that there were no significant differences in the two experiments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE)

自引率

0.00%

发文量