LASTD: A Manually Annotated and Tested Large Arabic Sentiment Tweets Dataset

2021 the 5th International Conference on Information System and Data Mining Pub Date : 2021-05-27 DOI:10.1145/3471287.3471293

Kariman Elshakankery, M. Fayek, Mona Farouk

引用次数: 4

Abstract

With the growing attention towards Arabic Sentiment Analysis (SA), the availability of annotated dataset has raised. Although acquiring dataset from social media platforms, microblogs and so on is an easy task, annotation is the hard part. Dataset annotation requires a lot of manual tedious work which stands as a major problem. In addition to that, some datasets are built in house and aren't available for public access. This paper introduces the LASTD which is a manually annotated dataset for Arabic tweets sentiment analysis along with an insight of its statistics and benchmarks. It consists of more than 15K Arabic tweets annotated as positive, negative and neutral. Using 10-cross validation, three different classifiers were trained and tested for 3-class classification problem and 2-class classification problem. The support vector machine (SVM) classifier tends to have the highest accuracy. LASTD is made public for academic research.

查看原文本刊更多论文

LASTD:一个手动标注和测试的大型阿拉伯语情感推文数据集

随着人们对阿拉伯语情感分析(SA)的日益关注，标注数据集的可用性也越来越高。虽然从社交媒体平台、微博等获取数据集是一件容易的事情，但标注是困难的部分。数据集标注需要大量繁琐的手工工作，这是一个主要问题。除此之外，一些数据集是内部构建的，不供公众访问。本文介绍了LASTD，这是一个用于阿拉伯语推文情感分析的手动注释数据集，并对其统计数据和基准进行了分析。它由超过15K的阿拉伯语推文组成，这些推文被标注为积极的、消极的和中立的。采用10-交叉验证，对3类分类问题和2类分类问题分别训练和测试了3种不同的分类器。支持向量机(SVM)分类器往往具有最高的准确率。LASTD被公开用于学术研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 the 5th International Conference on Information System and Data Mining

自引率

0.00%

发文量