Twitter情感分析的启发式辅助BERT

Int. J. Comput. Intell. Appl. Pub Date : 2021-09-01 DOI:10.1142/s1469026821500152

Gokul Yenduri, R. RajakumarBoothalingam, K. Praghash, D. Binu

{"title":"Twitter情感分析的启发式辅助BERT","authors":"Gokul Yenduri, R. RajakumarBoothalingam, K. Praghash, D. Binu","doi":"10.1142/s1469026821500152","DOIUrl":null,"url":null,"abstract":"The identification of opinions and sentiments from tweets is termed as “Twitter Sentiment Analysis (TSA)”. The major process of TSA is to determine the sentiment or polarity of the tweet and then classifying them into a negative or positive tweet. There are several methods introduced for carrying out TSA, however, it remains to be challenging due to slang words, modern accents, grammatical and spelling mistakes, and other issues that could not be solved by existing techniques. This work develops a novel customized BERT-oriented sentiment classification that encompasses two main phases: pre-processing and tokenization, and a “Customized Bidirectional Encoder Representations from Transformers (BERT)”-based classification. At first, the gathered raw tweets are pre-processed under stop-word removal, stemming and blank space removal. After pre-processing, the semantic words are obtained, from which the meaningful words (tokens) are extracted in the tokenization phase. Consequently, these extracted tokens are classified via optimized BERT, where biases and weight are tuned optimally by Particle-Assisted Circle Updating Position (PA-CUP). Moreover, the maximal sequence length of the BERT encoder is updated using standard PA-CUP. Finally, the performance analysis is carried out to substantiate the enhancement of the proposed model.","PeriodicalId":422521,"journal":{"name":"Int. J. Comput. Intell. Appl.","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Heuristic-Assisted BERT for Twitter Sentiment Analysis\",\"authors\":\"Gokul Yenduri, R. RajakumarBoothalingam, K. Praghash, D. Binu\",\"doi\":\"10.1142/s1469026821500152\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The identification of opinions and sentiments from tweets is termed as “Twitter Sentiment Analysis (TSA)”. The major process of TSA is to determine the sentiment or polarity of the tweet and then classifying them into a negative or positive tweet. There are several methods introduced for carrying out TSA, however, it remains to be challenging due to slang words, modern accents, grammatical and spelling mistakes, and other issues that could not be solved by existing techniques. This work develops a novel customized BERT-oriented sentiment classification that encompasses two main phases: pre-processing and tokenization, and a “Customized Bidirectional Encoder Representations from Transformers (BERT)”-based classification. At first, the gathered raw tweets are pre-processed under stop-word removal, stemming and blank space removal. After pre-processing, the semantic words are obtained, from which the meaningful words (tokens) are extracted in the tokenization phase. Consequently, these extracted tokens are classified via optimized BERT, where biases and weight are tuned optimally by Particle-Assisted Circle Updating Position (PA-CUP). Moreover, the maximal sequence length of the BERT encoder is updated using standard PA-CUP. Finally, the performance analysis is carried out to substantiate the enhancement of the proposed model.\",\"PeriodicalId\":422521,\"journal\":{\"name\":\"Int. J. Comput. Intell. Appl.\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Comput. Intell. Appl.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s1469026821500152\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Intell. Appl.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s1469026821500152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

从推特中识别观点和情绪被称为“推特情绪分析(TSA)”。TSA的主要过程是确定tweet的情绪或极性，然后将其分类为消极或积极的tweet。实施TSA有几种方法，然而，由于俚语，现代口音，语法和拼写错误以及其他现有技术无法解决的问题，它仍然具有挑战性。这项工作开发了一种新的定制的面向BERT的情感分类，它包括两个主要阶段:预处理和标记化，以及基于“自定义的双向编码器表示来自变压器(BERT)”的分类。首先，对收集到的原始推文进行停词去除、词干提取和空格去除等预处理。预处理后获得语义词，在标记化阶段从中提取有意义的词(标记)。因此，这些提取的标记通过优化的BERT进行分类，其中偏差和权重通过粒子辅助圆更新位置(PA-CUP)进行优化调整。此外，使用标准PA-CUP更新BERT编码器的最大序列长度。最后，进行了性能分析，以验证所提出模型的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Heuristic-Assisted BERT for Twitter Sentiment Analysis

The identification of opinions and sentiments from tweets is termed as “Twitter Sentiment Analysis (TSA)”. The major process of TSA is to determine the sentiment or polarity of the tweet and then classifying them into a negative or positive tweet. There are several methods introduced for carrying out TSA, however, it remains to be challenging due to slang words, modern accents, grammatical and spelling mistakes, and other issues that could not be solved by existing techniques. This work develops a novel customized BERT-oriented sentiment classification that encompasses two main phases: pre-processing and tokenization, and a “Customized Bidirectional Encoder Representations from Transformers (BERT)”-based classification. At first, the gathered raw tweets are pre-processed under stop-word removal, stemming and blank space removal. After pre-processing, the semantic words are obtained, from which the meaningful words (tokens) are extracted in the tokenization phase. Consequently, these extracted tokens are classified via optimized BERT, where biases and weight are tuned optimally by Particle-Assisted Circle Updating Position (PA-CUP). Moreover, the maximal sequence length of the BERT encoder is updated using standard PA-CUP. Finally, the performance analysis is carried out to substantiate the enhancement of the proposed model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Int. J. Comput. Intell. Appl.

自引率

0.00%

发文量