利用自回归特征提取和主题建模对低资源语言阿拉伯语进行基于方面的情感分析

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2023-12-27 DOI:10.1145/3638050

Asmaa Hashem Sweidan, Nashwa El-Bendary, Esraa Elhariri

{"title":"利用自回归特征提取和主题建模对低资源语言阿拉伯语进行基于方面的情感分析","authors":"Asmaa Hashem Sweidan, Nashwa El-Bendary, Esraa Elhariri","doi":"10.1145/3638050","DOIUrl":null,"url":null,"abstract":"<p>This paper proposes an approach for aspect-based sentiment analysis of Arabic social data, especially the considerable text corpus generated through communications on Twitter for expressing opinions in Arabic-language tweets during the COVID-19 pandemic. The proposed approach examines the performance of several pre-trained predictive and autoregressive language models; namely, BERT (Bidirectional Encoder Representations from Transformers) and XLNet, along with topic modeling algorithms; namely, LDA (Latent Dirichlet Allocation) and NMF (Non-negative Matrix Factorization), for aspect-based sentiment analysis of online Arabic text. In addition, Bi-LSTM (Bidirectional Long Short Term Memory) deep learning model is used to classify the extracted aspects from online reviews. Obtained experimental results indicate that the combined XLNet-NMF model outperforms other implemented state-of-the-art methods through improving the feature extraction of unstructured social media text with achieving values of 0.946 and 0.938, for average sentiment classification accuracy and F-measure, respectively.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"22 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2023-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Autoregressive Feature Extraction with Topic Modeling for Aspect-based Sentiment Analysis of Arabic as a Low-resource Language\",\"authors\":\"Asmaa Hashem Sweidan, Nashwa El-Bendary, Esraa Elhariri\",\"doi\":\"10.1145/3638050\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>This paper proposes an approach for aspect-based sentiment analysis of Arabic social data, especially the considerable text corpus generated through communications on Twitter for expressing opinions in Arabic-language tweets during the COVID-19 pandemic. The proposed approach examines the performance of several pre-trained predictive and autoregressive language models; namely, BERT (Bidirectional Encoder Representations from Transformers) and XLNet, along with topic modeling algorithms; namely, LDA (Latent Dirichlet Allocation) and NMF (Non-negative Matrix Factorization), for aspect-based sentiment analysis of online Arabic text. In addition, Bi-LSTM (Bidirectional Long Short Term Memory) deep learning model is used to classify the extracted aspects from online reviews. Obtained experimental results indicate that the combined XLNet-NMF model outperforms other implemented state-of-the-art methods through improving the feature extraction of unstructured social media text with achieving values of 0.946 and 0.938, for average sentiment classification accuracy and F-measure, respectively.</p>\",\"PeriodicalId\":54312,\"journal\":{\"name\":\"ACM Transactions on Asian and Low-Resource Language Information Processing\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2023-12-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Asian and Low-Resource Language Information Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3638050\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Asian and Low-Resource Language Information Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3638050","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种对阿拉伯语社交数据进行基于方面的情感分析的方法，特别是在 COVID-19 大流行期间，通过 Twitter 上的交流产生的大量阿拉伯语推文表达意见的文本语料库。所提出的方法检验了几个预先训练的预测和自回归语言模型（即 BERT（来自变换器的双向编码器表示）和 XLNet）以及主题建模算法（即 LDA（潜在德里希特分配）和 NMF（非负矩阵因数分解））的性能，用于对在线阿拉伯语文本进行基于方面的情感分析。此外，Bi-LSTM（双向长短期记忆）深度学习模型用于对从在线评论中提取的方面进行分类。实验结果表明，XLNet-NMF 组合模型改善了非结构化社交媒体文本的特征提取，平均情感分类准确率和 F-measure 值分别达到 0.946 和 0.938，优于其他最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Autoregressive Feature Extraction with Topic Modeling for Aspect-based Sentiment Analysis of Arabic as a Low-resource Language

This paper proposes an approach for aspect-based sentiment analysis of Arabic social data, especially the considerable text corpus generated through communications on Twitter for expressing opinions in Arabic-language tweets during the COVID-19 pandemic. The proposed approach examines the performance of several pre-trained predictive and autoregressive language models; namely, BERT (Bidirectional Encoder Representations from Transformers) and XLNet, along with topic modeling algorithms; namely, LDA (Latent Dirichlet Allocation) and NMF (Non-negative Matrix Factorization), for aspect-based sentiment analysis of online Arabic text. In addition, Bi-LSTM (Bidirectional Long Short Term Memory) deep learning model is used to classify the extracted aspects from online reviews. Obtained experimental results indicate that the combined XLNet-NMF model outperforms other implemented state-of-the-art methods through improving the feature extraction of unstructured social media text with achieving values of 0.946 and 0.938, for average sentiment classification accuracy and F-measure, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Asian and Low-Resource Language Information Processing Computer Science-General Computer Science

CiteScore

3.60

自引率

15.00%

发文量

241

期刊介绍： The ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) publishes high quality original archival papers and technical notes in the areas of computation and processing of information in Asian languages, low-resource languages of Africa, Australasia, Oceania and the Americas, as well as related disciplines. The subject areas covered by TALLIP include, but are not limited to: -Computational Linguistics: including computational phonology, computational morphology, computational syntax (e.g. parsing), computational semantics, computational pragmatics, etc. -Linguistic Resources: including computational lexicography, terminology, electronic dictionaries, cross-lingual dictionaries, electronic thesauri, etc. -Hardware and software algorithms and tools for Asian or low-resource language processing, e.g., handwritten character recognition. -Information Understanding: including text understanding, speech understanding, character recognition, discourse processing, dialogue systems, etc. -Machine Translation involving Asian or low-resource languages. -Information Retrieval: including natural language processing (NLP) for concept-based indexing, natural language query interfaces, semantic relevance judgments, etc. -Information Extraction and Filtering: including automatic abstraction, user profiling, etc. -Speech processing: including text-to-speech synthesis and automatic speech recognition. -Multimedia Asian Information Processing: including speech, image, video, image/text translation, etc. -Cross-lingual information processing involving Asian or low-resource languages. -Papers that deal in theory, systems design, evaluation and applications in the aforesaid subjects are appropriate for TALLIP. Emphasis will be placed on the originality and the practical significance of the reported research.