{"title":"用于文档分类的分层文本-标签集成关注网络","authors":"Changjin Gong, Kaize Shi, Zhendong Niu","doi":"10.1145/3341069.3342987","DOIUrl":null,"url":null,"abstract":"Recurrent neural networks (RNN) and convolutional neural networks (CNN) have been extensively used on text classification to capture the local and long-range dependencies. Recent work has demonstrated the superiority of self-attention networks (SAN) owing to their highly parallelizable computation and excellent performance. However, SAN has difficulty capturing meaningful semantic relationships over very long sequences, and the memory requirement grows rapidly in line with the sequence length. To solve these limitations of SAN in processing long document sequence, this paper proposes four novel ideas and build a hierarchical text-label integrated attention network(HLAN). Firstly, a hierarchical architecture is introduced to map the hierarchy of document, which effectively shortens the sequence length of each process. Secondly, the attention weights are calculated in the joint embedding space of text and label. Thirdly, a multi-head soft attention is proposed to compress the sequence encoded by self-attention into a single vector. Finally, a loss term called class loss is given and combined with cross entropy loss. HLAN achieves competitive results over the compared strong baseline methods on 4 out of 5 benchmark datasets, which verifies the effectiveness of HLAN for document classification, in terms of both accuracy and memory requirement.","PeriodicalId":411198,"journal":{"name":"Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Hierarchical Text-Label Integrated Attention Network for Document Classification\",\"authors\":\"Changjin Gong, Kaize Shi, Zhendong Niu\",\"doi\":\"10.1145/3341069.3342987\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recurrent neural networks (RNN) and convolutional neural networks (CNN) have been extensively used on text classification to capture the local and long-range dependencies. Recent work has demonstrated the superiority of self-attention networks (SAN) owing to their highly parallelizable computation and excellent performance. However, SAN has difficulty capturing meaningful semantic relationships over very long sequences, and the memory requirement grows rapidly in line with the sequence length. To solve these limitations of SAN in processing long document sequence, this paper proposes four novel ideas and build a hierarchical text-label integrated attention network(HLAN). Firstly, a hierarchical architecture is introduced to map the hierarchy of document, which effectively shortens the sequence length of each process. Secondly, the attention weights are calculated in the joint embedding space of text and label. Thirdly, a multi-head soft attention is proposed to compress the sequence encoded by self-attention into a single vector. Finally, a loss term called class loss is given and combined with cross entropy loss. HLAN achieves competitive results over the compared strong baseline methods on 4 out of 5 benchmark datasets, which verifies the effectiveness of HLAN for document classification, in terms of both accuracy and memory requirement.\",\"PeriodicalId\":411198,\"journal\":{\"name\":\"Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3341069.3342987\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3341069.3342987","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hierarchical Text-Label Integrated Attention Network for Document Classification
Recurrent neural networks (RNN) and convolutional neural networks (CNN) have been extensively used on text classification to capture the local and long-range dependencies. Recent work has demonstrated the superiority of self-attention networks (SAN) owing to their highly parallelizable computation and excellent performance. However, SAN has difficulty capturing meaningful semantic relationships over very long sequences, and the memory requirement grows rapidly in line with the sequence length. To solve these limitations of SAN in processing long document sequence, this paper proposes four novel ideas and build a hierarchical text-label integrated attention network(HLAN). Firstly, a hierarchical architecture is introduced to map the hierarchy of document, which effectively shortens the sequence length of each process. Secondly, the attention weights are calculated in the joint embedding space of text and label. Thirdly, a multi-head soft attention is proposed to compress the sequence encoded by self-attention into a single vector. Finally, a loss term called class loss is given and combined with cross entropy loss. HLAN achieves competitive results over the compared strong baseline methods on 4 out of 5 benchmark datasets, which verifies the effectiveness of HLAN for document classification, in terms of both accuracy and memory requirement.