{"title":"Dynamically Jointing character and word embedding for Chinese text Classification","authors":"Xuetao Tang, Xuegang Hu, Peipei Li","doi":"10.1109/ICBK50248.2020.00055","DOIUrl":null,"url":null,"abstract":"Chinese text classification is drawing attention in these few years. Different from English texts, there is no natural separator between Chinese words. With the development of deep learning, many character-level only models have been proposed for Chinese text classification to tackle this problem, which have achieved more success than word-level models. But the word information is also important for Chinese text representation, especially for short texts with less information. However, most of neural network models either just concatenate character-level representation and word-level representation, or use massive external knowledge to represent the whole text, which is complex and time-consuming. For better and easier representing the Chinese text without any external knowledge and using as much character and word information as possible, we propose a simple model jointed character and word embedding dynamically, called DJCW. Firstly, the character-level and word-level BiLSTM Model is introduced to extract features of texts with indefinite lengths. Secondly, the char and word are weightedly combined and the weights are changed dynamically. Finally, experiments conducted on five open-source text datasets show our model can handle the texts with different lengths and has achieved good stability results.","PeriodicalId":432857,"journal":{"name":"2020 IEEE International Conference on Knowledge Graph (ICKG)","volume":"63 7","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Knowledge Graph (ICKG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBK50248.2020.00055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Chinese text classification is drawing attention in these few years. Different from English texts, there is no natural separator between Chinese words. With the development of deep learning, many character-level only models have been proposed for Chinese text classification to tackle this problem, which have achieved more success than word-level models. But the word information is also important for Chinese text representation, especially for short texts with less information. However, most of neural network models either just concatenate character-level representation and word-level representation, or use massive external knowledge to represent the whole text, which is complex and time-consuming. For better and easier representing the Chinese text without any external knowledge and using as much character and word information as possible, we propose a simple model jointed character and word embedding dynamically, called DJCW. Firstly, the character-level and word-level BiLSTM Model is introduced to extract features of texts with indefinite lengths. Secondly, the char and word are weightedly combined and the weights are changed dynamically. Finally, experiments conducted on five open-source text datasets show our model can handle the texts with different lengths and has achieved good stability results.