{"title":"Study on Chinese text classification for FastText that combing TF-RF and improved random walk model","authors":"Zheng Wang","doi":"10.1109/ICSP51882.2021.9408910","DOIUrl":null,"url":null,"abstract":"FastText is a text classification model by Facebook. As the model is simple in structure, it has the advantage of fast and efficient. However, when the model is used in Chinese text classification, the accurate rate will decrease. To this end, a Chinese FastText text classification method combing Term Frequency-Relevance Frequency (TF-RF) and improved random walk model is suggested in the paper. The method makes TF-R weight choice to N-gram processed dictionaries during the input stage of the FastText model, making semantic analysis by using Probabilistic Latent Semantic Analysis (PLSA), and supplements to feature words; then utilizes the improved random walk model to improve the accuracy, and the improved model is more suitable for Chinese text classification. The experiment result shows that improved model in the paper has a better effect to Chinese text classification.","PeriodicalId":117159,"journal":{"name":"2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSP51882.2021.9408910","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
FastText is a text classification model by Facebook. As the model is simple in structure, it has the advantage of fast and efficient. However, when the model is used in Chinese text classification, the accurate rate will decrease. To this end, a Chinese FastText text classification method combing Term Frequency-Relevance Frequency (TF-RF) and improved random walk model is suggested in the paper. The method makes TF-R weight choice to N-gram processed dictionaries during the input stage of the FastText model, making semantic analysis by using Probabilistic Latent Semantic Analysis (PLSA), and supplements to feature words; then utilizes the improved random walk model to improve the accuracy, and the improved model is more suitable for Chinese text classification. The experiment result shows that improved model in the paper has a better effect to Chinese text classification.