基于潜在语义分析的作文评分系统的词级自动纠错

2017 15th International Conference on Quality in Research (QiR) : International Symposium on Electrical and Computer Engineering Pub Date : 2017-07-01 DOI:10.1109/QIR.2017.8168488

A. A. P. Ratna, Randy Sanjaya, Tomi Wirianata, Prima Dewi Purnamasari

{"title":"基于潜在语义分析的作文评分系统的词级自动纠错","authors":"A. A. P. Ratna, Randy Sanjaya, Tomi Wirianata, Prima Dewi Purnamasari","doi":"10.1109/QIR.2017.8168488","DOIUrl":null,"url":null,"abstract":"Assessment is an important step in the learning process in which the assessor evaluates students' level of understanding. One model of assessment is essay, which may cause problems in scoring objectivity and performance drop of human body when grading many essays. To ease essay grading and resolve those problems, a system that can assess documents according to its contexts is needed. From this concern, we developed a Java-based system for grading essays in Indonesian language using a more efficient and optimal algorithm. This algorithm consisted of 4 stages. The first stage is Latent Semantic Analysis (LSA), which is used to obtain and conclude the contextual relation of words meaning in a text. The second stage uses Single Value Decomposition (SVD) to obtain scatter variance from the relations. SVD identifies where variances appear at most, therefore is enabled to find the best approach to the original data using reduced dimensions. The third stage is Latent Semantic Indexing (LSI) which is an indexing and retrieval method to identifies patterns in relation between terms and concepts contained in unstructured text collection and results with a vector representing the text. The last stage is Cosine Similarity Measurement (CSM) to obtain similarity value from the text and answer document. To resolve problems stemmed from grammar and vocabulary, in this work we propose an auto-correction technique to check a word from word library for equalization of word with same or no specific meaning. Then, Jaro-Winkler distance algorithm is used to check word errors caused by accident when typing. With the distance, we can determine whether two strings of word are similar. This is extremely important when scanning text with typos, as it will affect the result from LSA. Using this system, the value obtained is similar to the value obtained from human rater. With word library consisting of 97 words for synonym check and 204 function words, the resulting accuracy is 85.246% ± 13.129.","PeriodicalId":225743,"journal":{"name":"2017 15th International Conference on Quality in Research (QiR) : International Symposium on Electrical and Computer Engineering","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Word level auto-correction for latent semantic analysis based essay grading system\",\"authors\":\"A. A. P. Ratna, Randy Sanjaya, Tomi Wirianata, Prima Dewi Purnamasari\",\"doi\":\"10.1109/QIR.2017.8168488\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Assessment is an important step in the learning process in which the assessor evaluates students' level of understanding. One model of assessment is essay, which may cause problems in scoring objectivity and performance drop of human body when grading many essays. To ease essay grading and resolve those problems, a system that can assess documents according to its contexts is needed. From this concern, we developed a Java-based system for grading essays in Indonesian language using a more efficient and optimal algorithm. This algorithm consisted of 4 stages. The first stage is Latent Semantic Analysis (LSA), which is used to obtain and conclude the contextual relation of words meaning in a text. The second stage uses Single Value Decomposition (SVD) to obtain scatter variance from the relations. SVD identifies where variances appear at most, therefore is enabled to find the best approach to the original data using reduced dimensions. The third stage is Latent Semantic Indexing (LSI) which is an indexing and retrieval method to identifies patterns in relation between terms and concepts contained in unstructured text collection and results with a vector representing the text. The last stage is Cosine Similarity Measurement (CSM) to obtain similarity value from the text and answer document. To resolve problems stemmed from grammar and vocabulary, in this work we propose an auto-correction technique to check a word from word library for equalization of word with same or no specific meaning. Then, Jaro-Winkler distance algorithm is used to check word errors caused by accident when typing. With the distance, we can determine whether two strings of word are similar. This is extremely important when scanning text with typos, as it will affect the result from LSA. Using this system, the value obtained is similar to the value obtained from human rater. With word library consisting of 97 words for synonym check and 204 function words, the resulting accuracy is 85.246% ± 13.129.\",\"PeriodicalId\":225743,\"journal\":{\"name\":\"2017 15th International Conference on Quality in Research (QiR) : International Symposium on Electrical and Computer Engineering\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 15th International Conference on Quality in Research (QiR) : International Symposium on Electrical and Computer Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/QIR.2017.8168488\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 15th International Conference on Quality in Research (QiR) : International Symposium on Electrical and Computer Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QIR.2017.8168488","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

评估是学习过程中重要的一步，评估者评估学生的理解水平。其中一种评估模式是作文，在给很多作文评分时，可能会出现评分客观性和人体性能下降的问题。为了简化论文评分并解决这些问题，需要一个可以根据上下文对文件进行评估的系统。出于这个考虑，我们开发了一个基于java的系统，使用更有效和最优的算法对印尼语的论文进行评分。该算法分为4个阶段。第一阶段是潜在语义分析(LSA)，用于获取和总结文本中单词意义的上下文关系。第二阶段使用单值分解(SVD)从关系中获得散点方差。SVD确定方差最多出现的位置，因此能够使用降维方法找到原始数据的最佳方法。第三阶段是潜在语义索引(LSI)，这是一种索引和检索方法，用于识别非结构化文本集合中包含的术语和概念与表示文本的向量的结果之间的关系模式。最后一个阶段是余弦相似度测量(CSM)，从文本和答案文档中获得相似度值。为了解决语法和词汇方面的问题，本文提出了一种自动纠错技术，从单词库中检查单词是否具有相同或没有特定含义的单词。然后，使用Jaro-Winkler距离算法对打字过程中因意外造成的单词错误进行检查。有了距离，我们就可以判断两个字串是否相似。在扫描带有错别字的文本时，这一点非常重要，因为它会影响LSA的结果。使用该系统，所获得的数值与人类的数值相近。同义词库共97个，虚词204个，准确率为85.246%±13.129。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Word level auto-correction for latent semantic analysis based essay grading system

Assessment is an important step in the learning process in which the assessor evaluates students' level of understanding. One model of assessment is essay, which may cause problems in scoring objectivity and performance drop of human body when grading many essays. To ease essay grading and resolve those problems, a system that can assess documents according to its contexts is needed. From this concern, we developed a Java-based system for grading essays in Indonesian language using a more efficient and optimal algorithm. This algorithm consisted of 4 stages. The first stage is Latent Semantic Analysis (LSA), which is used to obtain and conclude the contextual relation of words meaning in a text. The second stage uses Single Value Decomposition (SVD) to obtain scatter variance from the relations. SVD identifies where variances appear at most, therefore is enabled to find the best approach to the original data using reduced dimensions. The third stage is Latent Semantic Indexing (LSI) which is an indexing and retrieval method to identifies patterns in relation between terms and concepts contained in unstructured text collection and results with a vector representing the text. The last stage is Cosine Similarity Measurement (CSM) to obtain similarity value from the text and answer document. To resolve problems stemmed from grammar and vocabulary, in this work we propose an auto-correction technique to check a word from word library for equalization of word with same or no specific meaning. Then, Jaro-Winkler distance algorithm is used to check word errors caused by accident when typing. With the distance, we can determine whether two strings of word are similar. This is extremely important when scanning text with typos, as it will affect the result from LSA. Using this system, the value obtained is similar to the value obtained from human rater. With word library consisting of 97 words for synonym check and 204 function words, the resulting accuracy is 85.246% ± 13.129.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 15th International Conference on Quality in Research (QiR) : International Symposium on Electrical and Computer Engineering

自引率

0.00%

发文量