Combination of Lexical Resources and Support Vector Machine for Film Sentiment Analysis

Putri Agustina, Raissa Amanda Putri
{"title":"Combination of Lexical Resources and Support Vector Machine for Film Sentiment Analysis","authors":"Putri Agustina, Raissa Amanda Putri","doi":"10.33395/sinkron.v8i3.13733","DOIUrl":null,"url":null,"abstract":"Text data generated by internet users holds potentially valuable information that can be researched for new insights. One strategy for obtaining information from a text data set is to classify text into predetermined categories based on existing data. Text classification is an aspect of Text Mining. One of the popular approaches in Text Mining uses the Support Vector Machine (SVM) classification algorithm, which aims to classify text and separate data into different classes. However, in some cases, SVM classification algorithms may face difficulties in understanding the context of the text properly due to unclear wording, varying sentence structures, or a lack of understanding of interpretation. To address this problem, applying SVM classification using lexical resources can be an effective solution. In this research framework, the first step is to obtain data, which in this case is a film review dataset taken from the kaggle.com site. After obtaining the data, the next step is preprocessing. The results of the preprocessing are then divided into 80:20 percentages. The 80% training data is used to search for the form of polarization, and this training data lexicon is used for training the SVM model. Based on the modeling results, the overall model accuracy is around 85%, calculated using the confusion matrix. The precision value, which shows the proportion of correct positive predictions, reached 88%. The precision for negative predictions reached 80%, and for neutral predictions, it reached 0%. These results show that the Lexicon+SVM model has good performance, with an accuracy of 85%.","PeriodicalId":34046,"journal":{"name":"Sinkron","volume":"76 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sinkron","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33395/sinkron.v8i3.13733","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Text data generated by internet users holds potentially valuable information that can be researched for new insights. One strategy for obtaining information from a text data set is to classify text into predetermined categories based on existing data. Text classification is an aspect of Text Mining. One of the popular approaches in Text Mining uses the Support Vector Machine (SVM) classification algorithm, which aims to classify text and separate data into different classes. However, in some cases, SVM classification algorithms may face difficulties in understanding the context of the text properly due to unclear wording, varying sentence structures, or a lack of understanding of interpretation. To address this problem, applying SVM classification using lexical resources can be an effective solution. In this research framework, the first step is to obtain data, which in this case is a film review dataset taken from the kaggle.com site. After obtaining the data, the next step is preprocessing. The results of the preprocessing are then divided into 80:20 percentages. The 80% training data is used to search for the form of polarization, and this training data lexicon is used for training the SVM model. Based on the modeling results, the overall model accuracy is around 85%, calculated using the confusion matrix. The precision value, which shows the proportion of correct positive predictions, reached 88%. The precision for negative predictions reached 80%, and for neutral predictions, it reached 0%. These results show that the Lexicon+SVM model has good performance, with an accuracy of 85%.
结合词法资源和支持向量机进行电影情感分析
互联网用户生成的文本数据蕴含着潜在的宝贵信息,可以通过研究获得新的见解。从文本数据集中获取信息的一种策略是根据现有数据将文本归入预定类别。文本分类是文本挖掘的一个方面。文本挖掘的流行方法之一是使用支持向量机(SVM)分类算法,其目的是对文本进行分类,并将数据分成不同的类别。然而,在某些情况下,由于用词不清、句子结构不同或缺乏对解释的理解,SVM 分类算法可能难以正确理解文本的上下文。为解决这一问题,利用词汇资源进行 SVM 分类不失为一种有效的解决方案。在本研究框架中,第一步是获取数据,本例中的数据是来自 kaggle.com 网站的电影评论数据集。获取数据后,下一步是预处理。然后将预处理的结果按 80:20 的比例进行分配。80% 的训练数据用于搜索极化的形式,这个训练数据词库用于训练 SVM 模型。根据建模结果,使用混淆矩阵计算出的整体模型准确率约为 85%。显示正向预测正确率的精度值达到了 88%。负面预测的精确度达到了 80%,中性预测的精确度为 0%。这些结果表明,Lexicon+SVM 模型性能良好,准确率达到 85%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
204
审稿时长
4 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信