基于嵌入非正式词和基于注意的LSTM网络的非正式波斯语文本情感分析

2020 8th Iranian Joint Congress on Fuzzy and intelligent Systems (CFIS) Pub Date : 2020-09-01 DOI:10.1109/CFIS49607.2020.9238699

M. Karrabi, Leila Oskooie, M. Bakhtiar, Mohammad Farahani, R. Monsefi

{"title":"基于嵌入非正式词和基于注意的LSTM网络的非正式波斯语文本情感分析","authors":"M. Karrabi, Leila Oskooie, M. Bakhtiar, Mohammad Farahani, R. Monsefi","doi":"10.1109/CFIS49607.2020.9238699","DOIUrl":null,"url":null,"abstract":"The massive volume of comments on websites and social networks has made it possible to raise awareness of people's beliefs and preferences regarding products and services on a large scale. For this purpose, sentiment analysis, which refers to the determination of the sentiment of texts, has been proposed as an intelligent solution. From a methodological point of view, the recent combination of words embedding and deep neural networks (DNNs) has become an effective approach for sentiment analysis. In Persian studies, formal corpuses such as Wikipedia dumps have been used for word embedding. The fundamental difference between formal and informal texts means that the vectors derived from formal texts in informal contexts such as social networks do not result in desirable accuracy. To overcome this drawback, in this paper, we provide a large integrated text corpus of several different sources of informal comments and we also utilize the Fasttext as the word embedding algorithm. In this research, we use Attention-based LSTM, which has been shown to perform more effectively compared to the similar methods in sentiment analysis for the English language. The proposed method is evaluated on the two Persian “Taaghche” and “Filimo” datasets collected in this paper. The experiments on the two Persian datasets prove that utilizing informal vectors in sentiment analysis and applying the attention model improves the prediction accuracy of the DNN in the sentiment analysis of Persian texts.","PeriodicalId":128323,"journal":{"name":"2020 8th Iranian Joint Congress on Fuzzy and intelligent Systems (CFIS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Sentiment Analysis of Informal Persian Texts Using Embedding Informal words and Attention-Based LSTM Network\",\"authors\":\"M. Karrabi, Leila Oskooie, M. Bakhtiar, Mohammad Farahani, R. Monsefi\",\"doi\":\"10.1109/CFIS49607.2020.9238699\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The massive volume of comments on websites and social networks has made it possible to raise awareness of people's beliefs and preferences regarding products and services on a large scale. For this purpose, sentiment analysis, which refers to the determination of the sentiment of texts, has been proposed as an intelligent solution. From a methodological point of view, the recent combination of words embedding and deep neural networks (DNNs) has become an effective approach for sentiment analysis. In Persian studies, formal corpuses such as Wikipedia dumps have been used for word embedding. The fundamental difference between formal and informal texts means that the vectors derived from formal texts in informal contexts such as social networks do not result in desirable accuracy. To overcome this drawback, in this paper, we provide a large integrated text corpus of several different sources of informal comments and we also utilize the Fasttext as the word embedding algorithm. In this research, we use Attention-based LSTM, which has been shown to perform more effectively compared to the similar methods in sentiment analysis for the English language. The proposed method is evaluated on the two Persian “Taaghche” and “Filimo” datasets collected in this paper. The experiments on the two Persian datasets prove that utilizing informal vectors in sentiment analysis and applying the attention model improves the prediction accuracy of the DNN in the sentiment analysis of Persian texts.\",\"PeriodicalId\":128323,\"journal\":{\"name\":\"2020 8th Iranian Joint Congress on Fuzzy and intelligent Systems (CFIS)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 8th Iranian Joint Congress on Fuzzy and intelligent Systems (CFIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CFIS49607.2020.9238699\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 8th Iranian Joint Congress on Fuzzy and intelligent Systems (CFIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CFIS49607.2020.9238699","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

网站和社交网络上的大量评论使得大规模地提高人们对产品和服务的信仰和偏好的认识成为可能。为此，情感分析作为一种智能解决方案被提出，情感分析指的是确定文本的情感。从方法论的角度来看，词嵌入和深度神经网络(dnn)的结合已经成为情感分析的有效方法。在波斯语研究中，像维基百科转储这样的正式语料库被用于单词嵌入。正式文本和非正式文本之间的根本区别意味着，从非正式语境(如社交网络)中的正式文本衍生出来的向量不会产生理想的准确性。为了克服这一缺点，在本文中，我们提供了一个大型的集成文本语料库，其中包含了几种不同来源的非正式评论，并且我们还使用Fasttext作为词嵌入算法。在这项研究中，我们使用了基于注意力的LSTM，与类似的方法相比，它在英语情感分析中表现得更有效。在本文收集的两个波斯语“Taaghche”和“Filimo”数据集上对所提出的方法进行了评估。在两个波斯语数据集上的实验证明，在情感分析中使用非正式向量并应用注意模型提高了深度神经网络在波斯语文本情感分析中的预测精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sentiment Analysis of Informal Persian Texts Using Embedding Informal words and Attention-Based LSTM Network

The massive volume of comments on websites and social networks has made it possible to raise awareness of people's beliefs and preferences regarding products and services on a large scale. For this purpose, sentiment analysis, which refers to the determination of the sentiment of texts, has been proposed as an intelligent solution. From a methodological point of view, the recent combination of words embedding and deep neural networks (DNNs) has become an effective approach for sentiment analysis. In Persian studies, formal corpuses such as Wikipedia dumps have been used for word embedding. The fundamental difference between formal and informal texts means that the vectors derived from formal texts in informal contexts such as social networks do not result in desirable accuracy. To overcome this drawback, in this paper, we provide a large integrated text corpus of several different sources of informal comments and we also utilize the Fasttext as the word embedding algorithm. In this research, we use Attention-based LSTM, which has been shown to perform more effectively compared to the similar methods in sentiment analysis for the English language. The proposed method is evaluated on the two Persian “Taaghche” and “Filimo” datasets collected in this paper. The experiments on the two Persian datasets prove that utilizing informal vectors in sentiment analysis and applying the attention model improves the prediction accuracy of the DNN in the sentiment analysis of Persian texts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 8th Iranian Joint Congress on Fuzzy and intelligent Systems (CFIS)

自引率

0.00%

发文量