Sentiment analysis using Latent Dirichlet Allocation and topic polarity wordcloud visualization

M. F. A. Bashri, R. Kusumaningrum
{"title":"Sentiment analysis using Latent Dirichlet Allocation and topic polarity wordcloud visualization","authors":"M. F. A. Bashri, R. Kusumaningrum","doi":"10.1109/ICOICT.2017.8074651","DOIUrl":null,"url":null,"abstract":"Sentiment analysis is a field of study that analyzes sentiment. One method for doing sentiment analysis is Latent Dirichlet Allocation (LDA) that extracts the topic of documents where the topic is represented as the appearance of the words with different topic probability. Therefore, we need data representation in visual form that is easier to understand than text and tables. One form of data visualization is wordcloud that provides a visual representation of words frequency. This research will perform sentiment analysis from the students' comments toward a university, in this case the Universitas Diponegoro, using LDA and topic polarity wordcloud visualization. The purpose of this study is to generate the topic polarity wordcloud of the students' comments by using the best combination of parameters. The best combination is the parameter with the value of alpha 0.1, value of beta 0.1, number of topics 9, threshold 10−7, and perplexity values 8.07. Such parameter combination produces 3 topics as positive sentiment and 6 topics as negative sentiment. In addition, we also compare the proposed method to several algorithms such as Naïve Bayes and Logistic Regression. The final result shows that the proposed method outperforms the Naïve Bayes and Logistic Regression in terms of F-Measure by 61%, 54%, and 56%, respectively.","PeriodicalId":244500,"journal":{"name":"2017 5th International Conference on Information and Communication Technology (ICoIC7)","volume":"54 8","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 5th International Conference on Information and Communication Technology (ICoIC7)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOICT.2017.8074651","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25

Abstract

Sentiment analysis is a field of study that analyzes sentiment. One method for doing sentiment analysis is Latent Dirichlet Allocation (LDA) that extracts the topic of documents where the topic is represented as the appearance of the words with different topic probability. Therefore, we need data representation in visual form that is easier to understand than text and tables. One form of data visualization is wordcloud that provides a visual representation of words frequency. This research will perform sentiment analysis from the students' comments toward a university, in this case the Universitas Diponegoro, using LDA and topic polarity wordcloud visualization. The purpose of this study is to generate the topic polarity wordcloud of the students' comments by using the best combination of parameters. The best combination is the parameter with the value of alpha 0.1, value of beta 0.1, number of topics 9, threshold 10−7, and perplexity values 8.07. Such parameter combination produces 3 topics as positive sentiment and 6 topics as negative sentiment. In addition, we also compare the proposed method to several algorithms such as Naïve Bayes and Logistic Regression. The final result shows that the proposed method outperforms the Naïve Bayes and Logistic Regression in terms of F-Measure by 61%, 54%, and 56%, respectively.
基于潜在狄利克雷分配和主题极性词云可视化的情感分析
情感分析是分析情感的一个研究领域。进行情感分析的一种方法是Latent Dirichlet Allocation (LDA),它提取文档的主题,其中主题被表示为具有不同主题概率的单词的出现。因此,我们需要比文本和表格更容易理解的可视化形式的数据表示。数据可视化的一种形式是词云,它提供了词频的可视化表示。本研究将使用LDA和主题极性词云可视化,从学生对一所大学的评论中进行情感分析,在本例中是Diponegoro大学。本研究的目的是利用参数的最佳组合来生成学生评论的主题极性词云。最佳组合为alpha值为0.1,beta值为0.1,主题数为9,阈值为10−7,perplexity值为8.07的参数。这样的参数组合产生3个积极情绪话题和6个消极情绪话题。此外,我们还将提出的方法与Naïve贝叶斯和逻辑回归等几种算法进行了比较。最终结果表明,该方法在F-Measure方面分别优于Naïve Bayes和Logistic回归方法61%、54%和56%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信