Twitter情感分析的启发式辅助BERT

Gokul Yenduri, R. RajakumarBoothalingam, K. Praghash, D. Binu
{"title":"Twitter情感分析的启发式辅助BERT","authors":"Gokul Yenduri, R. RajakumarBoothalingam, K. Praghash, D. Binu","doi":"10.1142/s1469026821500152","DOIUrl":null,"url":null,"abstract":"The identification of opinions and sentiments from tweets is termed as “Twitter Sentiment Analysis (TSA)”. The major process of TSA is to determine the sentiment or polarity of the tweet and then classifying them into a negative or positive tweet. There are several methods introduced for carrying out TSA, however, it remains to be challenging due to slang words, modern accents, grammatical and spelling mistakes, and other issues that could not be solved by existing techniques. This work develops a novel customized BERT-oriented sentiment classification that encompasses two main phases: pre-processing and tokenization, and a “Customized Bidirectional Encoder Representations from Transformers (BERT)”-based classification. At first, the gathered raw tweets are pre-processed under stop-word removal, stemming and blank space removal. After pre-processing, the semantic words are obtained, from which the meaningful words (tokens) are extracted in the tokenization phase. Consequently, these extracted tokens are classified via optimized BERT, where biases and weight are tuned optimally by Particle-Assisted Circle Updating Position (PA-CUP). Moreover, the maximal sequence length of the BERT encoder is updated using standard PA-CUP. Finally, the performance analysis is carried out to substantiate the enhancement of the proposed model.","PeriodicalId":422521,"journal":{"name":"Int. J. Comput. Intell. Appl.","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Heuristic-Assisted BERT for Twitter Sentiment Analysis\",\"authors\":\"Gokul Yenduri, R. RajakumarBoothalingam, K. Praghash, D. Binu\",\"doi\":\"10.1142/s1469026821500152\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The identification of opinions and sentiments from tweets is termed as “Twitter Sentiment Analysis (TSA)”. The major process of TSA is to determine the sentiment or polarity of the tweet and then classifying them into a negative or positive tweet. There are several methods introduced for carrying out TSA, however, it remains to be challenging due to slang words, modern accents, grammatical and spelling mistakes, and other issues that could not be solved by existing techniques. This work develops a novel customized BERT-oriented sentiment classification that encompasses two main phases: pre-processing and tokenization, and a “Customized Bidirectional Encoder Representations from Transformers (BERT)”-based classification. At first, the gathered raw tweets are pre-processed under stop-word removal, stemming and blank space removal. After pre-processing, the semantic words are obtained, from which the meaningful words (tokens) are extracted in the tokenization phase. Consequently, these extracted tokens are classified via optimized BERT, where biases and weight are tuned optimally by Particle-Assisted Circle Updating Position (PA-CUP). Moreover, the maximal sequence length of the BERT encoder is updated using standard PA-CUP. Finally, the performance analysis is carried out to substantiate the enhancement of the proposed model.\",\"PeriodicalId\":422521,\"journal\":{\"name\":\"Int. J. Comput. Intell. Appl.\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Comput. Intell. Appl.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s1469026821500152\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Intell. Appl.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s1469026821500152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

从推特中识别观点和情绪被称为“推特情绪分析(TSA)”。TSA的主要过程是确定tweet的情绪或极性,然后将其分类为消极或积极的tweet。实施TSA有几种方法,然而,由于俚语,现代口音,语法和拼写错误以及其他现有技术无法解决的问题,它仍然具有挑战性。这项工作开发了一种新的定制的面向BERT的情感分类,它包括两个主要阶段:预处理和标记化,以及基于“自定义的双向编码器表示来自变压器(BERT)”的分类。首先,对收集到的原始推文进行停词去除、词干提取和空格去除等预处理。预处理后获得语义词,在标记化阶段从中提取有意义的词(标记)。因此,这些提取的标记通过优化的BERT进行分类,其中偏差和权重通过粒子辅助圆更新位置(PA-CUP)进行优化调整。此外,使用标准PA-CUP更新BERT编码器的最大序列长度。最后,进行了性能分析,以验证所提出模型的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Heuristic-Assisted BERT for Twitter Sentiment Analysis
The identification of opinions and sentiments from tweets is termed as “Twitter Sentiment Analysis (TSA)”. The major process of TSA is to determine the sentiment or polarity of the tweet and then classifying them into a negative or positive tweet. There are several methods introduced for carrying out TSA, however, it remains to be challenging due to slang words, modern accents, grammatical and spelling mistakes, and other issues that could not be solved by existing techniques. This work develops a novel customized BERT-oriented sentiment classification that encompasses two main phases: pre-processing and tokenization, and a “Customized Bidirectional Encoder Representations from Transformers (BERT)”-based classification. At first, the gathered raw tweets are pre-processed under stop-word removal, stemming and blank space removal. After pre-processing, the semantic words are obtained, from which the meaningful words (tokens) are extracted in the tokenization phase. Consequently, these extracted tokens are classified via optimized BERT, where biases and weight are tuned optimally by Particle-Assisted Circle Updating Position (PA-CUP). Moreover, the maximal sequence length of the BERT encoder is updated using standard PA-CUP. Finally, the performance analysis is carried out to substantiate the enhancement of the proposed model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信