Rui Xiao, W. Guo, Yunchun Zhang, Xiaoyan Ma, Jiaqi Jiang
{"title":"Machine Learning-based Automated Essay Scoring System for Chinese Proficiency Test (HSK)","authors":"Rui Xiao, W. Guo, Yunchun Zhang, Xiaoyan Ma, Jiaqi Jiang","doi":"10.1145/3443279.3443299","DOIUrl":null,"url":null,"abstract":"Automated essay scoring (AES) gains momentum recently in English-based environment. However, the development of Chinese AES system is slow and fruitless. Many foreign students participate in the Chinese Proficiency Test (HSK) so a HSK automated essay scoring system (HSK AES) is in high demand. To develop an effective and reliable HSK AES system, this paper proposes three machine learning and deep learning models that take HSK essays as input. We apply Word2vec and TF-IDF (term frequency-inverse document frequency) methods to extract important features from the original essays. Three machine learning models, including XGBoost, one deep neural network with flatten and dense layer and another deep neural network with LSTM (long short-term memory) and dense layer, are trained. The experimental results show that XGBoost with TF-IDF outperforms the other two models with the lowest MAE (mean absolute error) as 6.7%. We also prove that deep neural networks either with LSTM (long short-term memory) or with flatten perform unsatisfactory on HSK AES.","PeriodicalId":414366,"journal":{"name":"Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3443279.3443299","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Automated essay scoring (AES) gains momentum recently in English-based environment. However, the development of Chinese AES system is slow and fruitless. Many foreign students participate in the Chinese Proficiency Test (HSK) so a HSK automated essay scoring system (HSK AES) is in high demand. To develop an effective and reliable HSK AES system, this paper proposes three machine learning and deep learning models that take HSK essays as input. We apply Word2vec and TF-IDF (term frequency-inverse document frequency) methods to extract important features from the original essays. Three machine learning models, including XGBoost, one deep neural network with flatten and dense layer and another deep neural network with LSTM (long short-term memory) and dense layer, are trained. The experimental results show that XGBoost with TF-IDF outperforms the other two models with the lowest MAE (mean absolute error) as 6.7%. We also prove that deep neural networks either with LSTM (long short-term memory) or with flatten perform unsatisfactory on HSK AES.