{"title":"基于ERNIE的多通道文本分类模型","authors":"Dongxue Bao, Donghong Qin, Lila Hong, Siqi Zhan","doi":"10.1145/3581807.3581853","DOIUrl":null,"url":null,"abstract":"Aiming at the large amount of news and review text data, sparse features, and the inability of traditional text feature representation to dynamically obtain grammatical structure, semantic information, and multi-dimensional rich feature representation of entity phrases. This paper proposes to obtain more generalized knowledge semantic feature information such as rich context phrases, entity words and so on by integrating knowledge enhanced semantic representation (Enhanced Representation Through Knowledge Integration, ERNIE). The pre-trained language model ERNIE hides words and entities by random Semantic unit prediction context realizes word vector language representation, and the output vector representation of ERNIE is input to BiLSTM, Attention mechanism and DPCNN network model to generate high-order text feature vectors, and each channel vector is processed by BatchNormalization and ReLU activation functions respectively.Thus, the semantic description information of the multi-channel word vector is fused. The model proposed in this paper can not only improve the training speed and prevent overfitting, but also enhance the feature information such as semantics and grammatical structure, thereby improving the text classification effect. By comparing the two datasets with other improved ERNIE models in terms of accuracy, precision, recall, and F1 value, the experimental results show that the model proposed in this paper can obtain multi-dimensional rich semantic grammatical structure features for text classification, and then improve the Text classification effect.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Channel Text Classification Model Based on ERNIE\",\"authors\":\"Dongxue Bao, Donghong Qin, Lila Hong, Siqi Zhan\",\"doi\":\"10.1145/3581807.3581853\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aiming at the large amount of news and review text data, sparse features, and the inability of traditional text feature representation to dynamically obtain grammatical structure, semantic information, and multi-dimensional rich feature representation of entity phrases. This paper proposes to obtain more generalized knowledge semantic feature information such as rich context phrases, entity words and so on by integrating knowledge enhanced semantic representation (Enhanced Representation Through Knowledge Integration, ERNIE). The pre-trained language model ERNIE hides words and entities by random Semantic unit prediction context realizes word vector language representation, and the output vector representation of ERNIE is input to BiLSTM, Attention mechanism and DPCNN network model to generate high-order text feature vectors, and each channel vector is processed by BatchNormalization and ReLU activation functions respectively.Thus, the semantic description information of the multi-channel word vector is fused. The model proposed in this paper can not only improve the training speed and prevent overfitting, but also enhance the feature information such as semantics and grammatical structure, thereby improving the text classification effect. By comparing the two datasets with other improved ERNIE models in terms of accuracy, precision, recall, and F1 value, the experimental results show that the model proposed in this paper can obtain multi-dimensional rich semantic grammatical structure features for text classification, and then improve the Text classification effect.\",\"PeriodicalId\":292813,\"journal\":{\"name\":\"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3581807.3581853\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3581807.3581853","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
针对新闻评论文本数据量大、特征稀疏、传统文本特征表示无法动态获取实体短语的语法结构、语义信息、多维丰富特征表示等问题。本文提出通过集成知识增强语义表示(enhanced representation Through knowledge Integration, ERNIE)来获得更广义的知识语义特征信息,如富上下文短语、实体词等。预训练语言模型ERNIE通过随机语义单元预测上下文隐藏单词和实体,实现单词向量语言表示,将ERNIE输出的向量表示输入到BiLSTM、Attention机制和DPCNN网络模型中生成高阶文本特征向量,每个通道向量分别通过BatchNormalization和ReLU激活函数进行处理。从而融合了多通道词向量的语义描述信息。本文提出的模型不仅可以提高训练速度,防止过拟合,而且可以增强语义、语法结构等特征信息,从而提高文本分类效果。通过将这两个数据集与其他改进的ERNIE模型在准确率、精密度、查全率和F1值等方面进行比较,实验结果表明,本文提出的模型可以获得多维度的丰富语义语法结构特征,用于文本分类,从而提高文本分类效果。
Multi-Channel Text Classification Model Based on ERNIE
Aiming at the large amount of news and review text data, sparse features, and the inability of traditional text feature representation to dynamically obtain grammatical structure, semantic information, and multi-dimensional rich feature representation of entity phrases. This paper proposes to obtain more generalized knowledge semantic feature information such as rich context phrases, entity words and so on by integrating knowledge enhanced semantic representation (Enhanced Representation Through Knowledge Integration, ERNIE). The pre-trained language model ERNIE hides words and entities by random Semantic unit prediction context realizes word vector language representation, and the output vector representation of ERNIE is input to BiLSTM, Attention mechanism and DPCNN network model to generate high-order text feature vectors, and each channel vector is processed by BatchNormalization and ReLU activation functions respectively.Thus, the semantic description information of the multi-channel word vector is fused. The model proposed in this paper can not only improve the training speed and prevent overfitting, but also enhance the feature information such as semantics and grammatical structure, thereby improving the text classification effect. By comparing the two datasets with other improved ERNIE models in terms of accuracy, precision, recall, and F1 value, the experimental results show that the model proposed in this paper can obtain multi-dimensional rich semantic grammatical structure features for text classification, and then improve the Text classification effect.