Assessing Short Answers in Indonesian Using Semantic Text Similarity Method and Dynamic Corpus

2020 12th International Conference on Information Technology and Electrical Engineering (ICITEE) Pub Date : 2020-10-06 DOI:10.1109/ICITEE49829.2020.9271696

U. Hasanah, Bambang Pilu Hartato

{"title":"Assessing Short Answers in Indonesian Using Semantic Text Similarity Method and Dynamic Corpus","authors":"U. Hasanah, Bambang Pilu Hartato","doi":"10.1109/ICITEE49829.2020.9271696","DOIUrl":null,"url":null,"abstract":"Automatic assessment of short answers is one of the Computer Assisted Test works that can assess answers in natural language. Several methods have been used to create a system capable of assessing short answers that are close to human markings. In Indonesian, it might be easy to use string-based similarity methods by matching keywords, as has been done in previous studies. However, short answers have characteristics that focus on content, question type, and answer length, which cannot be accommodated only by string-based methods. This study aims to implement a hybrid method using corpus and string-based similarities. The Semantic Text Similarity (STS) method was used in this study to assess short answers in Indonesian. The STS method consists of three combinations of similarity methods, namely Normalized and Modified Longest Common Subsequence, Second Order Co-occurrence Pointwise Mutual Information, and Common Word Order Similarity. We also use a dynamic corpus with the advantage of being relatively small and adaptable to the learning domain. The Gensim Module is used to generate a dynamic corpus. The dynamic corpus uses the top five answers from students obtained from the Gensim module. The STS method is compared with the Cosine Similarity method since Cosine Similarity is the most commonly used method to assess answers in Indonesian. The results show that the STS method can outperform the Cosine Similarity method based on the Mean Absolute Error value, but still not outperformed in terms of correlation.","PeriodicalId":245013,"journal":{"name":"2020 12th International Conference on Information Technology and Electrical Engineering (ICITEE)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 12th International Conference on Information Technology and Electrical Engineering (ICITEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITEE49829.2020.9271696","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Automatic assessment of short answers is one of the Computer Assisted Test works that can assess answers in natural language. Several methods have been used to create a system capable of assessing short answers that are close to human markings. In Indonesian, it might be easy to use string-based similarity methods by matching keywords, as has been done in previous studies. However, short answers have characteristics that focus on content, question type, and answer length, which cannot be accommodated only by string-based methods. This study aims to implement a hybrid method using corpus and string-based similarities. The Semantic Text Similarity (STS) method was used in this study to assess short answers in Indonesian. The STS method consists of three combinations of similarity methods, namely Normalized and Modified Longest Common Subsequence, Second Order Co-occurrence Pointwise Mutual Information, and Common Word Order Similarity. We also use a dynamic corpus with the advantage of being relatively small and adaptable to the learning domain. The Gensim Module is used to generate a dynamic corpus. The dynamic corpus uses the top five answers from students obtained from the Gensim module. The STS method is compared with the Cosine Similarity method since Cosine Similarity is the most commonly used method to assess answers in Indonesian. The results show that the STS method can outperform the Cosine Similarity method based on the Mean Absolute Error value, but still not outperformed in terms of correlation.

查看原文本刊更多论文

基于语义文本相似度和动态语料库的印尼语短文答案评价

短句来源自动答题是对自然语言答题进行评估的计算机辅助测试工作之一。已经使用了几种方法来创建一个能够评估接近人类标记的简短答案的系统。在印尼语中，通过匹配关键字来使用基于字符串的相似度方法可能很容易，正如之前的研究所做的那样。然而，简答题具有侧重于内容、问题类型和答案长度的特点，仅基于字符串的方法无法满足这些特点。本研究旨在实现一种使用语料库和基于字符串的相似度的混合方法。本研究采用语义文本相似度(STS)方法评估印尼语短文答案。STS方法由归一化与修正最长公共子序列、二阶共现点互信息和共词序相似三种相似度方法组合而成。我们还使用了一个动态语料库，它具有相对较小和适应学习领域的优势。Gensim模块用于生成动态语料库。动态语料库使用从Gensim模块中获得的学生的前五个答案。STS方法与余弦相似度法比较，因为余弦相似度是印尼语中最常用的评估答案的方法。结果表明，STS方法可以优于基于Mean Absolute Error值的余弦相似度方法，但在相关性方面仍不优于余弦相似度方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 12th International Conference on Information Technology and Electrical Engineering (ICITEE)

自引率

0.00%

发文量