孟加拉语新闻评论情感分析数据集及其基线评价

Md. Akhter-Uz-Zaman Ashik, S. Shovon, Summit Haque
{"title":"孟加拉语新闻评论情感分析数据集及其基线评价","authors":"Md. Akhter-Uz-Zaman Ashik, S. Shovon, Summit Haque","doi":"10.1109/ICBSLP47725.2019.201497","DOIUrl":null,"url":null,"abstract":"The biggest challenge of Bengali language processing is creating a strong data set to do research on. The main focus of this paper is to introduce an authentic and credible data set and this dataset is open for all to be used for educational purposes1 for Bengali sentiment analysis where the data was extracted from a well known online news portal’s user comments. Here comments on various news were scraped, and for detecting the true sentiments of the sentences, five labels of sentiments were used. An online crowd sourcing platform was used for data annotation. To ensure the credibility and validity of the data set, every entry of the data set was tagged three times. Three models of text classification were used for baseline evaluation to check the validity of the data set. This data set might be of valuable help for future works and researches on Bengali sentiment analysis.","PeriodicalId":413077,"journal":{"name":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Data Set For Sentiment Analysis On Bengali News Comments And Its Baseline Evaluation\",\"authors\":\"Md. Akhter-Uz-Zaman Ashik, S. Shovon, Summit Haque\",\"doi\":\"10.1109/ICBSLP47725.2019.201497\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The biggest challenge of Bengali language processing is creating a strong data set to do research on. The main focus of this paper is to introduce an authentic and credible data set and this dataset is open for all to be used for educational purposes1 for Bengali sentiment analysis where the data was extracted from a well known online news portal’s user comments. Here comments on various news were scraped, and for detecting the true sentiments of the sentences, five labels of sentiments were used. An online crowd sourcing platform was used for data annotation. To ensure the credibility and validity of the data set, every entry of the data set was tagged three times. Three models of text classification were used for baseline evaluation to check the validity of the data set. This data set might be of valuable help for future works and researches on Bengali sentiment analysis.\",\"PeriodicalId\":413077,\"journal\":{\"name\":\"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICBSLP47725.2019.201497\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBSLP47725.2019.201497","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

摘要

孟加拉语处理的最大挑战是创建一个强大的数据集来进行研究。本文的主要重点是介绍一个真实可信的数据集,该数据集对所有人开放,用于教育目的1,用于孟加拉情感分析,其中数据是从一个知名的在线新闻门户网站的用户评论中提取的。这里收集了各种新闻的评论,为了检测句子的真实情感,使用了五种情感标签。利用网络众包平台进行数据标注。为了保证数据集的可信度和有效性,数据集的每个条目都被标记了三次。使用三种文本分类模型进行基线评估,以检查数据集的有效性。该数据集可能对未来孟加拉语情感分析的工作和研究提供有价值的帮助。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Data Set For Sentiment Analysis On Bengali News Comments And Its Baseline Evaluation
The biggest challenge of Bengali language processing is creating a strong data set to do research on. The main focus of this paper is to introduce an authentic and credible data set and this dataset is open for all to be used for educational purposes1 for Bengali sentiment analysis where the data was extracted from a well known online news portal’s user comments. Here comments on various news were scraped, and for detecting the true sentiments of the sentences, five labels of sentiments were used. An online crowd sourcing platform was used for data annotation. To ensure the credibility and validity of the data set, every entry of the data set was tagged three times. Three models of text classification were used for baseline evaluation to check the validity of the data set. This data set might be of valuable help for future works and researches on Bengali sentiment analysis.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信