{"title":"Multi-level Feature Fusion Method for Long Text Classification","authors":"R. Lin, Lianglun Cheng, Jianfeng Deng, Tao Wang","doi":"10.1145/3529836.3529938","DOIUrl":null,"url":null,"abstract":"News classification task is essentially long text classification in the field of NLP (Natural Language Processing). Long text contains a lot of hidden or topic-independent information. Moreover, BERT (Bidirectional Encoder Representations from Transformer) can only process the text with a character sequence length of 512 at most, which may lose the key information and reduce the classification effectiveness. To solve above problems, the paper puts forward a model of mutli-level feature fusion based on BERT, which is suitable for the BERT through the hierarchical decomposition of long text. Then CNN (Convolutional Neural Networks) and stacked BiLSTM (Bidirectional Long Short-term Memory) based on attention mechanism are used to capture local and contextual features of text respectively. Finally, various features are spliced for classification task. The experimental results show that the model achieves 97.4% accuracy and 97.2% F1 score on THUCNews, 1.2% accuracy and 1.6% F1 score higher than that of BERT-CNN, 1.8% accuracy and 1.4% F1 score higher than that of BERT-BiLSTM, indicating that our model can significantly improve the effectiveness of news classification.","PeriodicalId":285191,"journal":{"name":"2022 14th International Conference on Machine Learning and Computing (ICMLC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Machine Learning and Computing (ICMLC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3529836.3529938","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
News classification task is essentially long text classification in the field of NLP (Natural Language Processing). Long text contains a lot of hidden or topic-independent information. Moreover, BERT (Bidirectional Encoder Representations from Transformer) can only process the text with a character sequence length of 512 at most, which may lose the key information and reduce the classification effectiveness. To solve above problems, the paper puts forward a model of mutli-level feature fusion based on BERT, which is suitable for the BERT through the hierarchical decomposition of long text. Then CNN (Convolutional Neural Networks) and stacked BiLSTM (Bidirectional Long Short-term Memory) based on attention mechanism are used to capture local and contextual features of text respectively. Finally, various features are spliced for classification task. The experimental results show that the model achieves 97.4% accuracy and 97.2% F1 score on THUCNews, 1.2% accuracy and 1.6% F1 score higher than that of BERT-CNN, 1.8% accuracy and 1.4% F1 score higher than that of BERT-BiLSTM, indicating that our model can significantly improve the effectiveness of news classification.