基于语料库的阿拉伯语情感词典构建方法

International Journal of Information Engineering and Electronic Business Pub Date : 2019-11-08 DOI:10.5815/ijieeb.2019.06.03

Afnan Alsolamy, M. Siddiqui, Imtiaz Hussain Khan

{"title":"基于语料库的阿拉伯语情感词典构建方法","authors":"Afnan Alsolamy, M. Siddiqui, Imtiaz Hussain Khan","doi":"10.5815/ijieeb.2019.06.03","DOIUrl":null,"url":null,"abstract":"Sentiment analysis is an application of artificial intelligence that determines the sentiment associated sentiment with a piece of text. It provides an easy alternative to a brand or company to receive customers' opinions about its products through user generated contents such as social media posts. Training a machine learning model for sentiment analysis requires the availability of resources such as labeled corpora and sentiment lexicons. While such resources are easily available for English, it is hard to find them for other languages such as Arabic. The aim of this research is to build an Arabic sentiment lexicon using a corpus-based approach. Sentiment scores were propagated from a small, manually labeled, seed list to other terms in a term co-occurrence graph. To achieve this, we proposed a graph propagation algorithm and compared different similarity measures. The lexicon was evaluated using a manually annotated list of terms. The use of similarity measures depends on the fact that the words that are appearing in the same context will have similar polarity. The main contribution of the work comes from the empirical evaluation of different similarity to assign the best sentiment scores to terms in the co-occurrence graph.","PeriodicalId":427770,"journal":{"name":"International Journal of Information Engineering and Electronic Business","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A Corpus Based Approach to Build Arabic Sentiment Lexicon\",\"authors\":\"Afnan Alsolamy, M. Siddiqui, Imtiaz Hussain Khan\",\"doi\":\"10.5815/ijieeb.2019.06.03\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment analysis is an application of artificial intelligence that determines the sentiment associated sentiment with a piece of text. It provides an easy alternative to a brand or company to receive customers' opinions about its products through user generated contents such as social media posts. Training a machine learning model for sentiment analysis requires the availability of resources such as labeled corpora and sentiment lexicons. While such resources are easily available for English, it is hard to find them for other languages such as Arabic. The aim of this research is to build an Arabic sentiment lexicon using a corpus-based approach. Sentiment scores were propagated from a small, manually labeled, seed list to other terms in a term co-occurrence graph. To achieve this, we proposed a graph propagation algorithm and compared different similarity measures. The lexicon was evaluated using a manually annotated list of terms. The use of similarity measures depends on the fact that the words that are appearing in the same context will have similar polarity. The main contribution of the work comes from the empirical evaluation of different similarity to assign the best sentiment scores to terms in the co-occurrence graph.\",\"PeriodicalId\":427770,\"journal\":{\"name\":\"International Journal of Information Engineering and Electronic Business\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Information Engineering and Electronic Business\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5815/ijieeb.2019.06.03\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Information Engineering and Electronic Business","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijieeb.2019.06.03","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

情感分析是人工智能的一种应用，它可以确定与一段文本相关的情感。它为品牌或公司提供了一个简单的替代方案，通过用户生成的内容(如社交媒体帖子)接收客户对其产品的意见。训练用于情感分析的机器学习模型需要可用的资源，如标记语料库和情感词典。虽然这些资源很容易获得英语，但很难找到其他语言(如阿拉伯语)的资源。本研究的目的是利用基于语料库的方法建立一个阿拉伯语情感词典。情绪得分从一个小的、手动标记的种子列表传播到术语共现图中的其他术语。为了实现这一点，我们提出了一种图传播算法，并比较了不同的相似度度量。使用手动注释的术语列表对词典进行评估。相似性度量的使用取决于出现在相同上下文中的单词具有相似的极性这一事实。该工作的主要贡献来自于对不同相似度的实证评估，以为共现图中的术语分配最佳情绪得分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Corpus Based Approach to Build Arabic Sentiment Lexicon

Sentiment analysis is an application of artificial intelligence that determines the sentiment associated sentiment with a piece of text. It provides an easy alternative to a brand or company to receive customers' opinions about its products through user generated contents such as social media posts. Training a machine learning model for sentiment analysis requires the availability of resources such as labeled corpora and sentiment lexicons. While such resources are easily available for English, it is hard to find them for other languages such as Arabic. The aim of this research is to build an Arabic sentiment lexicon using a corpus-based approach. Sentiment scores were propagated from a small, manually labeled, seed list to other terms in a term co-occurrence graph. To achieve this, we proposed a graph propagation algorithm and compared different similarity measures. The lexicon was evaluated using a manually annotated list of terms. The use of similarity measures depends on the fact that the words that are appearing in the same context will have similar polarity. The main contribution of the work comes from the empirical evaluation of different similarity to assign the best sentiment scores to terms in the co-occurrence graph.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Information Engineering and Electronic Business

自引率

0.00%

发文量