Ganhua Li, Bo Kong, Jiancheng Li, Henghai Fan, Jian Zhang, Yuan An, Zhenglei Yang, Shengrong Danz, Jiancun Fan
{"title":"基于bert的基于Web数据的文本情感分类算法","authors":"Ganhua Li, Bo Kong, Jiancheng Li, Henghai Fan, Jian Zhang, Yuan An, Zhenglei Yang, Shengrong Danz, Jiancun Fan","doi":"10.1109/ICCEAI55464.2022.00105","DOIUrl":null,"url":null,"abstract":"In order to analyze the sentiment tendency of public opinion, this paper conducts a textual sentiment classification research through web data. In the research, this paper uses the BERT (Bidirectional Encoder Representation from Transformers) model to replace the commonly used word2vec model as a text vectorization tool, which has stronger semantic representation capabilities and can realize polysemous words. For the multi-label classification problem of reviews, the BR (Binary Relevance) algorithm is used to transform the problem into multiple binary classification problems, which is directly and efficient for processing multi-label data. Design the BiLSTM-Attention model, which combines the bidirectional long and short-term memory network and the attention mechanism to achieve further extraction of text features. After multiple sets of comparative experiments, the effectiveness of the BiLSTM-Attention model is verified through performance evaluation. In order to further improve the performance of the model, the problem of unbalanced data set is solved by adjusting the loss function and various parameters so that a better classification effect is achieved.","PeriodicalId":414181,"journal":{"name":"2022 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A BERT-based Text Sentiment Classification Algorithm through Web Data\",\"authors\":\"Ganhua Li, Bo Kong, Jiancheng Li, Henghai Fan, Jian Zhang, Yuan An, Zhenglei Yang, Shengrong Danz, Jiancun Fan\",\"doi\":\"10.1109/ICCEAI55464.2022.00105\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to analyze the sentiment tendency of public opinion, this paper conducts a textual sentiment classification research through web data. In the research, this paper uses the BERT (Bidirectional Encoder Representation from Transformers) model to replace the commonly used word2vec model as a text vectorization tool, which has stronger semantic representation capabilities and can realize polysemous words. For the multi-label classification problem of reviews, the BR (Binary Relevance) algorithm is used to transform the problem into multiple binary classification problems, which is directly and efficient for processing multi-label data. Design the BiLSTM-Attention model, which combines the bidirectional long and short-term memory network and the attention mechanism to achieve further extraction of text features. After multiple sets of comparative experiments, the effectiveness of the BiLSTM-Attention model is verified through performance evaluation. In order to further improve the performance of the model, the problem of unbalanced data set is solved by adjusting the loss function and various parameters so that a better classification effect is achieved.\",\"PeriodicalId\":414181,\"journal\":{\"name\":\"2022 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCEAI55464.2022.00105\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCEAI55464.2022.00105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
为了分析舆情的情感倾向,本文通过网络数据进行了文本情感分类研究。在研究中,本文使用BERT (Bidirectional Encoder Representation from Transformers)模型代替常用的word2vec模型作为文本矢量化工具,具有更强的语义表示能力,可以实现多义词。对于评论的多标签分类问题,采用BR (Binary Relevance)算法将问题转化为多个二值分类问题,直接高效地处理多标签数据。设计BiLSTM-Attention模型,将双向长短期记忆网络与注意机制相结合,实现对文本特征的进一步提取。经过多组对比实验,通过性能评价验证了BiLSTM-Attention模型的有效性。为了进一步提高模型的性能,通过调整损失函数和各种参数来解决数据集不平衡的问题,从而达到更好的分类效果。
A BERT-based Text Sentiment Classification Algorithm through Web Data
In order to analyze the sentiment tendency of public opinion, this paper conducts a textual sentiment classification research through web data. In the research, this paper uses the BERT (Bidirectional Encoder Representation from Transformers) model to replace the commonly used word2vec model as a text vectorization tool, which has stronger semantic representation capabilities and can realize polysemous words. For the multi-label classification problem of reviews, the BR (Binary Relevance) algorithm is used to transform the problem into multiple binary classification problems, which is directly and efficient for processing multi-label data. Design the BiLSTM-Attention model, which combines the bidirectional long and short-term memory network and the attention mechanism to achieve further extraction of text features. After multiple sets of comparative experiments, the effectiveness of the BiLSTM-Attention model is verified through performance evaluation. In order to further improve the performance of the model, the problem of unbalanced data set is solved by adjusting the loss function and various parameters so that a better classification effect is achieved.