Document Level Emotion Detection from Bangla Text Using Machine Learning Techniques

Sadia Afrin Purba, Sadia Tasnim, Mobasshira Jabin, Tahmim Hossen, Md. Khairul Hasan
{"title":"Document Level Emotion Detection from Bangla Text Using Machine Learning Techniques","authors":"Sadia Afrin Purba, Sadia Tasnim, Mobasshira Jabin, Tahmim Hossen, Md. Khairul Hasan","doi":"10.1109/ICICT4SD50815.2021.9397036","DOIUrl":null,"url":null,"abstract":"Understanding emotion from documents automatically is an interesting research topic in the machine learning field. Nowadays, many applications like email, blog, etc have the ability to suggest joyful or angry expressions from written documents. In spite of being a popular language, Bangla lacks a rich corpus with annotated emotion labels, so recognizing emotion from documents is still not developed as other languages. In this work, we have proposed a new dataset containing Bangla documents with annotation of three emotions- Happy, Sad and Angry. Two major feature extraction techniques - Bag of Words(BoW) and Word Embedding is used to extract features from the documents. BoW is used by Logistic Regression and Multinomial Naive Bayes classifiers. Word Embedding is used by Artificial Neural Network(ANN) and Convolutional Neural Network(CNN) classifiers. Among all, Multinomial Naive Bayes classifier has given the best performance on the test set and the accuracy is 68.27%. We have made our dataset11Dataset: https://doi.org/10.6084/m9.figshare.13052789.v1 available for all to be used in further research purposes.","PeriodicalId":239251,"journal":{"name":"2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICT4SD50815.2021.9397036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Understanding emotion from documents automatically is an interesting research topic in the machine learning field. Nowadays, many applications like email, blog, etc have the ability to suggest joyful or angry expressions from written documents. In spite of being a popular language, Bangla lacks a rich corpus with annotated emotion labels, so recognizing emotion from documents is still not developed as other languages. In this work, we have proposed a new dataset containing Bangla documents with annotation of three emotions- Happy, Sad and Angry. Two major feature extraction techniques - Bag of Words(BoW) and Word Embedding is used to extract features from the documents. BoW is used by Logistic Regression and Multinomial Naive Bayes classifiers. Word Embedding is used by Artificial Neural Network(ANN) and Convolutional Neural Network(CNN) classifiers. Among all, Multinomial Naive Bayes classifier has given the best performance on the test set and the accuracy is 68.27%. We have made our dataset11Dataset: https://doi.org/10.6084/m9.figshare.13052789.v1 available for all to be used in further research purposes.
使用机器学习技术从孟加拉语文本中进行文档级情感检测
从文档中自动理解情感是机器学习领域一个有趣的研究课题。如今,许多应用程序,如电子邮件,博客等,都有能力从书面文件中建议快乐或愤怒的表达。尽管孟加拉语是一种流行的语言,但它缺乏丰富的带注释的情感标签语料库,因此从文档中识别情感仍然没有像其他语言那样发展起来。在这项工作中,我们提出了一个新的数据集,其中包含孟加拉语文档,并注释了三种情绪-快乐,悲伤和愤怒。从文档中提取特征主要采用词包(BoW)和词嵌入(Word Embedding)两种技术。BoW被逻辑回归和多项朴素贝叶斯分类器使用。词嵌入被人工神经网络(ANN)和卷积神经网络(CNN)分类器使用。其中,多项朴素贝叶斯分类器在测试集上表现最好,准确率为68.27%。我们已经将我们的dataset11Dataset: https://doi.org/10.6084/m9.figshare.13052789.v1提供给所有人用于进一步的研究目的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信