Chinese Text Classification Method Based on BERT Word Embedding

Proceedings of the 2020 5th International Conference on Mathematics and Artificial Intelligence Pub Date : 2020-04-10 DOI:10.1145/3395260.3395273

Ziniu Wang, Zhilin Huang, Jianling Gao

引用次数: 6

Abstract

In this paper, we enhance the semantic representation of the word through the BERT pre-training language model, dynamically generates the semantic vector according to the context of the character, and then inputs the character vector embedded as a character-level word vector sequence into the CapsNet.We builted the BiGRU module in the capsule network for text feature extraction, and introduced attention mechanism to focus on key information.We use the corpus of baidu's Chinese question and answer data set and only take the types of questions as classified samples to conduct experiments.We used the separate BERT network and the CapsNet as a comparative experiment. Finally, the experimental results show that the model effect is better than using one of the models alone, and the effect is improved.

查看原文本刊更多论文

基于BERT词嵌入的中文文本分类方法

本文通过BERT预训练语言模型增强单词的语义表示，根据字符的上下文动态生成语义向量，然后将嵌入的字符向量作为字符级单词向量序列输入CapsNet。我们在胶囊网络中构建了BiGRU模块进行文本特征提取，并引入关注机制对关键信息进行关注。我们使用b百度中文问答数据集的语料库，只将问题类型作为分类样本进行实验。我们使用单独的BERT网络和CapsNet作为比较实验。最后，实验结果表明，该模型的效果优于单独使用其中一个模型，并且效果得到了改善。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2020 5th International Conference on Mathematics and Artificial Intelligence

自引率

0.00%

发文量