{"title":"基于TF-IDF和SVM的中文可读性评价","authors":"Yaw-Huei Chen, Yao-Hung Hubert Tsai, Yu-Ta Chen","doi":"10.1109/ICMLC.2011.6016783","DOIUrl":null,"url":null,"abstract":"This paper proposes a simple yet effective method to automatically determine the readability of Chinese articles. We use mutual information to select the most important terms from the training data, calculate TF-IDF values based on those terms, and use those values as features for SVM to build classification models that identify articles suitable for lower grade students and middle grade students in elementary school. The experiments on elementary school textbooks produce satisfactory results.","PeriodicalId":228516,"journal":{"name":"2011 International Conference on Machine Learning and Cybernetics","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Chinese readability assessment using TF-IDF and SVM\",\"authors\":\"Yaw-Huei Chen, Yao-Hung Hubert Tsai, Yu-Ta Chen\",\"doi\":\"10.1109/ICMLC.2011.6016783\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a simple yet effective method to automatically determine the readability of Chinese articles. We use mutual information to select the most important terms from the training data, calculate TF-IDF values based on those terms, and use those values as features for SVM to build classification models that identify articles suitable for lower grade students and middle grade students in elementary school. The experiments on elementary school textbooks produce satisfactory results.\",\"PeriodicalId\":228516,\"journal\":{\"name\":\"2011 International Conference on Machine Learning and Cybernetics\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Machine Learning and Cybernetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLC.2011.6016783\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Machine Learning and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC.2011.6016783","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Chinese readability assessment using TF-IDF and SVM
This paper proposes a simple yet effective method to automatically determine the readability of Chinese articles. We use mutual information to select the most important terms from the training data, calculate TF-IDF values based on those terms, and use those values as features for SVM to build classification models that identify articles suitable for lower grade students and middle grade students in elementary school. The experiments on elementary school textbooks produce satisfactory results.