孟加拉语新闻评论情感分析数据集及其基线评价

2019 International Conference on Bangla Speech and Language Processing (ICBSLP) Pub Date : 2019-09-01 DOI:10.1109/ICBSLP47725.2019.201497

Md. Akhter-Uz-Zaman Ashik, S. Shovon, Summit Haque

{"title":"孟加拉语新闻评论情感分析数据集及其基线评价","authors":"Md. Akhter-Uz-Zaman Ashik, S. Shovon, Summit Haque","doi":"10.1109/ICBSLP47725.2019.201497","DOIUrl":null,"url":null,"abstract":"The biggest challenge of Bengali language processing is creating a strong data set to do research on. The main focus of this paper is to introduce an authentic and credible data set and this dataset is open for all to be used for educational purposes1 for Bengali sentiment analysis where the data was extracted from a well known online news portal’s user comments. Here comments on various news were scraped, and for detecting the true sentiments of the sentences, five labels of sentiments were used. An online crowd sourcing platform was used for data annotation. To ensure the credibility and validity of the data set, every entry of the data set was tagged three times. Three models of text classification were used for baseline evaluation to check the validity of the data set. This data set might be of valuable help for future works and researches on Bengali sentiment analysis.","PeriodicalId":413077,"journal":{"name":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Data Set For Sentiment Analysis On Bengali News Comments And Its Baseline Evaluation\",\"authors\":\"Md. Akhter-Uz-Zaman Ashik, S. Shovon, Summit Haque\",\"doi\":\"10.1109/ICBSLP47725.2019.201497\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The biggest challenge of Bengali language processing is creating a strong data set to do research on. The main focus of this paper is to introduce an authentic and credible data set and this dataset is open for all to be used for educational purposes1 for Bengali sentiment analysis where the data was extracted from a well known online news portal’s user comments. Here comments on various news were scraped, and for detecting the true sentiments of the sentences, five labels of sentiments were used. An online crowd sourcing platform was used for data annotation. To ensure the credibility and validity of the data set, every entry of the data set was tagged three times. Three models of text classification were used for baseline evaluation to check the validity of the data set. This data set might be of valuable help for future works and researches on Bengali sentiment analysis.\",\"PeriodicalId\":413077,\"journal\":{\"name\":\"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICBSLP47725.2019.201497\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBSLP47725.2019.201497","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

孟加拉语处理的最大挑战是创建一个强大的数据集来进行研究。本文的主要重点是介绍一个真实可信的数据集，该数据集对所有人开放，用于教育目的1，用于孟加拉情感分析，其中数据是从一个知名的在线新闻门户网站的用户评论中提取的。这里收集了各种新闻的评论，为了检测句子的真实情感，使用了五种情感标签。利用网络众包平台进行数据标注。为了保证数据集的可信度和有效性，数据集的每个条目都被标记了三次。使用三种文本分类模型进行基线评估，以检查数据集的有效性。该数据集可能对未来孟加拉语情感分析的工作和研究提供有价值的帮助。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Data Set For Sentiment Analysis On Bengali News Comments And Its Baseline Evaluation

The biggest challenge of Bengali language processing is creating a strong data set to do research on. The main focus of this paper is to introduce an authentic and credible data set and this dataset is open for all to be used for educational purposes1 for Bengali sentiment analysis where the data was extracted from a well known online news portal’s user comments. Here comments on various news were scraped, and for detecting the true sentiments of the sentences, five labels of sentiments were used. An online crowd sourcing platform was used for data annotation. To ensure the credibility and validity of the data set, every entry of the data set was tagged three times. Three models of text classification were used for baseline evaluation to check the validity of the data set. This data set might be of valuable help for future works and researches on Bengali sentiment analysis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 International Conference on Bangla Speech and Language Processing (ICBSLP)

自引率

0.00%

发文量