{"title":"利用嵌入策略提高深度学习网络分层文本分类的准确性","authors":"Chanatip Saetia, P. Vateekul","doi":"10.1109/JCSSE.2018.8457376","DOIUrl":null,"url":null,"abstract":"Hierarchical text categorization is a task that aims to assign predefined categories to text documents with hierarchical constraint. Recently, deep learning techniques has shown many success results in various fields, especially, in text categorization. In our previous work called Shared Hidden Layer Neural Network (SHL-NN), it has shown that sharing information between levels can improve a performance of the model. However, this work is based on a sequence of unsupervised word embedding vectors, so the performance should be limited. In this paper, we propose a supervised document embedding specifically designed for hierarchical text categorization based on Autoencoder, which is trained from both words and labels. To enhance the embedding vectors, the document embedding strategies are invented to utilize the class hierarchy information in the training process. To transfer the prediction result from the parent classes, the shared information technique has been improved to be more flexible and efficient. The experiment was conducted on three standard benchmarks: WIPO-C, WIPO-D and Wiki comparing to two baselines: SHL-NN and a top-down based SVM framework with TF-IDF inputs called “HR-SVM.” The results show that the proposed model outperforms all baselines in terms of F1 macro.","PeriodicalId":338973,"journal":{"name":"2018 15th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhance Accuracy of Hierarchical Text Categorization Based on Deep Learning Network Using Embedding Strategies\",\"authors\":\"Chanatip Saetia, P. Vateekul\",\"doi\":\"10.1109/JCSSE.2018.8457376\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hierarchical text categorization is a task that aims to assign predefined categories to text documents with hierarchical constraint. Recently, deep learning techniques has shown many success results in various fields, especially, in text categorization. In our previous work called Shared Hidden Layer Neural Network (SHL-NN), it has shown that sharing information between levels can improve a performance of the model. However, this work is based on a sequence of unsupervised word embedding vectors, so the performance should be limited. In this paper, we propose a supervised document embedding specifically designed for hierarchical text categorization based on Autoencoder, which is trained from both words and labels. To enhance the embedding vectors, the document embedding strategies are invented to utilize the class hierarchy information in the training process. To transfer the prediction result from the parent classes, the shared information technique has been improved to be more flexible and efficient. The experiment was conducted on three standard benchmarks: WIPO-C, WIPO-D and Wiki comparing to two baselines: SHL-NN and a top-down based SVM framework with TF-IDF inputs called “HR-SVM.” The results show that the proposed model outperforms all baselines in terms of F1 macro.\",\"PeriodicalId\":338973,\"journal\":{\"name\":\"2018 15th International Joint Conference on Computer Science and Software Engineering (JCSSE)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 15th International Joint Conference on Computer Science and Software Engineering (JCSSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/JCSSE.2018.8457376\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 15th International Joint Conference on Computer Science and Software Engineering (JCSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCSSE.2018.8457376","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Enhance Accuracy of Hierarchical Text Categorization Based on Deep Learning Network Using Embedding Strategies
Hierarchical text categorization is a task that aims to assign predefined categories to text documents with hierarchical constraint. Recently, deep learning techniques has shown many success results in various fields, especially, in text categorization. In our previous work called Shared Hidden Layer Neural Network (SHL-NN), it has shown that sharing information between levels can improve a performance of the model. However, this work is based on a sequence of unsupervised word embedding vectors, so the performance should be limited. In this paper, we propose a supervised document embedding specifically designed for hierarchical text categorization based on Autoencoder, which is trained from both words and labels. To enhance the embedding vectors, the document embedding strategies are invented to utilize the class hierarchy information in the training process. To transfer the prediction result from the parent classes, the shared information technique has been improved to be more flexible and efficient. The experiment was conducted on three standard benchmarks: WIPO-C, WIPO-D and Wiki comparing to two baselines: SHL-NN and a top-down based SVM framework with TF-IDF inputs called “HR-SVM.” The results show that the proposed model outperforms all baselines in terms of F1 macro.