Arabic Opinion Mining Using Distributed Representations of Documents

A. El-Halees
{"title":"Arabic Opinion Mining Using Distributed Representations of Documents","authors":"A. El-Halees","doi":"10.1109/PICICT.2017.15","DOIUrl":null,"url":null,"abstract":"Nowadays, many people express their opinions using user generated contains such as social media, forums and reviews. Opinion mining is a field of study that extracts sentiments from user generated contents. Because of the complexity of the Arabic language, extracting those opinions are challenging. Better representation of reviews can help to improve extraction of opinions. The traditional way of representing opinion documents is using Bag-of-Words where the word is presented in fixed-length. The problem of this presentation is that it loses the order of the word and it ignores grammatical structure and lexicon-dependent. To overcome these limitations, distributed representations can be employed. It is based on learning vector representations of words, which also called \"word embeddings\". It can make the performance of natural language processing tasks have better performance with the help of learning algorithms. This representation uses neural networks and makes the learned vectors explicitly encode many linguistic patterns. In this study, we used distributed representations for Arabic opinion mining and compare it with Bag of Words (BOW) representation. We applied them on four benchmark datasets. Then, we used four machine learning methods which are Support Vector Machine, Logistic Regression and Random Forest. Using f-measure metric, we found that, in all datasets and all methods we used in our experiment, the distributed representations have better performance than bag-of-words representation.","PeriodicalId":259869,"journal":{"name":"2017 Palestinian International Conference on Information and Communication Technology (PICICT)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Palestinian International Conference on Information and Communication Technology (PICICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PICICT.2017.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Nowadays, many people express their opinions using user generated contains such as social media, forums and reviews. Opinion mining is a field of study that extracts sentiments from user generated contents. Because of the complexity of the Arabic language, extracting those opinions are challenging. Better representation of reviews can help to improve extraction of opinions. The traditional way of representing opinion documents is using Bag-of-Words where the word is presented in fixed-length. The problem of this presentation is that it loses the order of the word and it ignores grammatical structure and lexicon-dependent. To overcome these limitations, distributed representations can be employed. It is based on learning vector representations of words, which also called "word embeddings". It can make the performance of natural language processing tasks have better performance with the help of learning algorithms. This representation uses neural networks and makes the learned vectors explicitly encode many linguistic patterns. In this study, we used distributed representations for Arabic opinion mining and compare it with Bag of Words (BOW) representation. We applied them on four benchmark datasets. Then, we used four machine learning methods which are Support Vector Machine, Logistic Regression and Random Forest. Using f-measure metric, we found that, in all datasets and all methods we used in our experiment, the distributed representations have better performance than bag-of-words representation.
使用分布式文档表示的阿拉伯意见挖掘
如今,很多人通过社交媒体、论坛、评论等用户生成内容来表达自己的观点。观点挖掘是从用户生成的内容中提取情感的一个研究领域。由于阿拉伯语的复杂性,提取这些观点是具有挑战性的。更好地呈现评论有助于改进意见的提取。传统的表达意见文件的方法是使用词袋(Bag-of-Words),其中单词以固定长度表示。这种表述的问题在于它忽略了单词的顺序,忽略了语法结构和词汇依赖性。为了克服这些限制,可以采用分布式表示。它基于单词的学习向量表示,也称为“单词嵌入”。在学习算法的帮助下,可以使自然语言处理任务的性能有更好的表现。这种表示使用神经网络,并使学习到的向量显式地编码许多语言模式。在本研究中,我们使用分布式表示进行阿拉伯语意见挖掘,并将其与词袋表示(BOW)进行比较。我们将它们应用于四个基准数据集。然后,我们使用了支持向量机、逻辑回归和随机森林四种机器学习方法。使用f-measure度量,我们发现,在我们实验中使用的所有数据集和所有方法中,分布式表示比词袋表示具有更好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信