{"title":"Chinese readability assessment using TF-IDF and SVM","authors":"Yaw-Huei Chen, Yao-Hung Hubert Tsai, Yu-Ta Chen","doi":"10.1109/ICMLC.2011.6016783","DOIUrl":null,"url":null,"abstract":"This paper proposes a simple yet effective method to automatically determine the readability of Chinese articles. We use mutual information to select the most important terms from the training data, calculate TF-IDF values based on those terms, and use those values as features for SVM to build classification models that identify articles suitable for lower grade students and middle grade students in elementary school. The experiments on elementary school textbooks produce satisfactory results.","PeriodicalId":228516,"journal":{"name":"2011 International Conference on Machine Learning and Cybernetics","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Machine Learning and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC.2011.6016783","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
This paper proposes a simple yet effective method to automatically determine the readability of Chinese articles. We use mutual information to select the most important terms from the training data, calculate TF-IDF values based on those terms, and use those values as features for SVM to build classification models that identify articles suitable for lower grade students and middle grade students in elementary school. The experiments on elementary school textbooks produce satisfactory results.