{"title":"利用自回归特征提取和主题建模对低资源语言阿拉伯语进行基于方面的情感分析","authors":"Asmaa Hashem Sweidan, Nashwa El-Bendary, Esraa Elhariri","doi":"10.1145/3638050","DOIUrl":null,"url":null,"abstract":"<p>This paper proposes an approach for aspect-based sentiment analysis of Arabic social data, especially the considerable text corpus generated through communications on Twitter for expressing opinions in Arabic-language tweets during the COVID-19 pandemic. The proposed approach examines the performance of several pre-trained predictive and autoregressive language models; namely, BERT (Bidirectional Encoder Representations from Transformers) and XLNet, along with topic modeling algorithms; namely, LDA (Latent Dirichlet Allocation) and NMF (Non-negative Matrix Factorization), for aspect-based sentiment analysis of online Arabic text. In addition, Bi-LSTM (Bidirectional Long Short Term Memory) deep learning model is used to classify the extracted aspects from online reviews. Obtained experimental results indicate that the combined XLNet-NMF model outperforms other implemented state-of-the-art methods through improving the feature extraction of unstructured social media text with achieving values of 0.946 and 0.938, for average sentiment classification accuracy and F-measure, respectively.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"22 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2023-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Autoregressive Feature Extraction with Topic Modeling for Aspect-based Sentiment Analysis of Arabic as a Low-resource Language\",\"authors\":\"Asmaa Hashem Sweidan, Nashwa El-Bendary, Esraa Elhariri\",\"doi\":\"10.1145/3638050\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>This paper proposes an approach for aspect-based sentiment analysis of Arabic social data, especially the considerable text corpus generated through communications on Twitter for expressing opinions in Arabic-language tweets during the COVID-19 pandemic. The proposed approach examines the performance of several pre-trained predictive and autoregressive language models; namely, BERT (Bidirectional Encoder Representations from Transformers) and XLNet, along with topic modeling algorithms; namely, LDA (Latent Dirichlet Allocation) and NMF (Non-negative Matrix Factorization), for aspect-based sentiment analysis of online Arabic text. In addition, Bi-LSTM (Bidirectional Long Short Term Memory) deep learning model is used to classify the extracted aspects from online reviews. Obtained experimental results indicate that the combined XLNet-NMF model outperforms other implemented state-of-the-art methods through improving the feature extraction of unstructured social media text with achieving values of 0.946 and 0.938, for average sentiment classification accuracy and F-measure, respectively.</p>\",\"PeriodicalId\":54312,\"journal\":{\"name\":\"ACM Transactions on Asian and Low-Resource Language Information Processing\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2023-12-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Asian and Low-Resource Language Information Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3638050\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Asian and Low-Resource Language Information Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3638050","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Autoregressive Feature Extraction with Topic Modeling for Aspect-based Sentiment Analysis of Arabic as a Low-resource Language
This paper proposes an approach for aspect-based sentiment analysis of Arabic social data, especially the considerable text corpus generated through communications on Twitter for expressing opinions in Arabic-language tweets during the COVID-19 pandemic. The proposed approach examines the performance of several pre-trained predictive and autoregressive language models; namely, BERT (Bidirectional Encoder Representations from Transformers) and XLNet, along with topic modeling algorithms; namely, LDA (Latent Dirichlet Allocation) and NMF (Non-negative Matrix Factorization), for aspect-based sentiment analysis of online Arabic text. In addition, Bi-LSTM (Bidirectional Long Short Term Memory) deep learning model is used to classify the extracted aspects from online reviews. Obtained experimental results indicate that the combined XLNet-NMF model outperforms other implemented state-of-the-art methods through improving the feature extraction of unstructured social media text with achieving values of 0.946 and 0.938, for average sentiment classification accuracy and F-measure, respectively.
期刊介绍:
The ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) publishes high quality original archival papers and technical notes in the areas of computation and processing of information in Asian languages, low-resource languages of Africa, Australasia, Oceania and the Americas, as well as related disciplines. The subject areas covered by TALLIP include, but are not limited to:
-Computational Linguistics: including computational phonology, computational morphology, computational syntax (e.g. parsing), computational semantics, computational pragmatics, etc.
-Linguistic Resources: including computational lexicography, terminology, electronic dictionaries, cross-lingual dictionaries, electronic thesauri, etc.
-Hardware and software algorithms and tools for Asian or low-resource language processing, e.g., handwritten character recognition.
-Information Understanding: including text understanding, speech understanding, character recognition, discourse processing, dialogue systems, etc.
-Machine Translation involving Asian or low-resource languages.
-Information Retrieval: including natural language processing (NLP) for concept-based indexing, natural language query interfaces, semantic relevance judgments, etc.
-Information Extraction and Filtering: including automatic abstraction, user profiling, etc.
-Speech processing: including text-to-speech synthesis and automatic speech recognition.
-Multimedia Asian Information Processing: including speech, image, video, image/text translation, etc.
-Cross-lingual information processing involving Asian or low-resource languages.
-Papers that deal in theory, systems design, evaluation and applications in the aforesaid subjects are appropriate for TALLIP. Emphasis will be placed on the originality and the practical significance of the reported research.