Comparative analysis of string similarity and corpus-based similarity for automatic essay scoring system on e-learning gamification

2016 International Conference on Advanced Computer Science and Information Systems (ICACSIS) Pub Date : 2016-10-01 DOI:10.1109/ICACSIS.2016.7872785

Eko Sakti Pramukantoro, M. Fauzi

{"title":"Comparative analysis of string similarity and corpus-based similarity for automatic essay scoring system on e-learning gamification","authors":"Eko Sakti Pramukantoro, M. Fauzi","doi":"10.1109/ICACSIS.2016.7872785","DOIUrl":null,"url":null,"abstract":"Essay assessment within e-learning need to be conducted manually by human expert. This process takes time and costly. Hence, automatic essay scoring is needed. Since the scoring system will be integrated to the e-learning, we need a computationally lightweight method that still does not rule out the accuracy of the assessment. In this paper, we propose an automatic scoring system for essay examination using unsupervised approaches. We compare and analyze two similarity measure methods, cosine similarity and latent semantic analysis. The parameters that was used to measure the performance of the methods are the computational complexity — measured by the amount of CPU and memory usage, and page load time — and accuracy — measured by Pearson Correlation and Mean Absolute Error. The results showed that both algorithm consumed same amount of memory. For CPU usage, LSA consumption is 0.13% and cosine's is 0.06%. For page load time, cosine similarity is faster than LSA which is 0.2 second and 0.5 second consecutively. Based on the correlation measure with Pearson, LSA is more superior to the cosine similarity by 0.59 to 0.49. LSA also has less MAE than cosine similarity which is 5.69 compared to 5.33. From that result, LSA and Cosine Similarity has a very competitive result in accuracy. However, Cosine has a better server performance so that preferred to be implemented in e-learning automatic essay scoring system.","PeriodicalId":267924,"journal":{"name":"2016 International Conference on Advanced Computer Science and Information Systems (ICACSIS)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Advanced Computer Science and Information Systems (ICACSIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACSIS.2016.7872785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

Abstract

Essay assessment within e-learning need to be conducted manually by human expert. This process takes time and costly. Hence, automatic essay scoring is needed. Since the scoring system will be integrated to the e-learning, we need a computationally lightweight method that still does not rule out the accuracy of the assessment. In this paper, we propose an automatic scoring system for essay examination using unsupervised approaches. We compare and analyze two similarity measure methods, cosine similarity and latent semantic analysis. The parameters that was used to measure the performance of the methods are the computational complexity — measured by the amount of CPU and memory usage, and page load time — and accuracy — measured by Pearson Correlation and Mean Absolute Error. The results showed that both algorithm consumed same amount of memory. For CPU usage, LSA consumption is 0.13% and cosine's is 0.06%. For page load time, cosine similarity is faster than LSA which is 0.2 second and 0.5 second consecutively. Based on the correlation measure with Pearson, LSA is more superior to the cosine similarity by 0.59 to 0.49. LSA also has less MAE than cosine similarity which is 5.69 compared to 5.33. From that result, LSA and Cosine Similarity has a very competitive result in accuracy. However, Cosine has a better server performance so that preferred to be implemented in e-learning automatic essay scoring system.

查看原文本刊更多论文

网络学习游戏化论文自动评分系统中字符串相似性与基于语料库的相似性对比分析

电子学习中的论文评估需要由人类专家手动进行。这个过程既费时又费钱。因此，自动作文评分是必要的。由于评分系统将集成到电子学习中，我们需要一种计算轻量级的方法，但仍然不排除评估的准确性。在本文中，我们提出了一个使用无监督方法的论文考试自动评分系统。对比分析了余弦相似度和潜在语义分析两种相似度度量方法。用来衡量这些方法性能的参数是计算复杂性(通过CPU和内存使用量以及页面加载时间来衡量)和准确性(通过Pearson Correlation和Mean Absolute Error来衡量)。结果表明，两种算法消耗的内存量相同。对于CPU使用率，LSA消耗为0.13%，余弦值为0.06%。对于页面加载时间，余弦相似度比LSA快，LSA的加载时间分别为0.2秒和0.5秒。基于Pearson的相关度量，LSA的相似性比余弦相似度高0.59 ~ 0.49。LSA的MAE也比余弦相似度低，前者为5.69，后者为5.33。从这个结果来看，LSA和余弦相似度在精度上具有很强的竞争力。然而，余弦具有更好的服务器性能，因此更倾向于在电子学习自动作文评分系统中实现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 International Conference on Advanced Computer Science and Information Systems (ICACSIS)

自引率

0.00%

发文量