Sentiment Analysis for IMDb Movie Review Using Support Vector Machine (SVM) Method

D. D. Nur Cahyo, F. Farasalsabila, Verra Budhi Lestari, Hanafi, Tutik Lestari, Fahmi Rusdi Al Islami, M. A. Maulana
{"title":"Sentiment Analysis for IMDb Movie Review Using Support Vector Machine (SVM) Method","authors":"D. D. Nur Cahyo, F. Farasalsabila, Verra Budhi Lestari, Hanafi, Tutik Lestari, Fahmi Rusdi Al Islami, M. A. Maulana","doi":"10.25139/inform.v8i2.5700","DOIUrl":null,"url":null,"abstract":"Many researchers currently employ supervised, machine learning methods to study sentiment analysis. Analysis can be done on movie reviews, Twitter reviews, online product reviews, blogs, discussion forums, Myspace comments, and social networks. Support Vector Machines (SVM) classifiers are used to analyze the Twitter data set using different parameters. The analysis and discussion were undertaken to allow for the conclusion that SVM has been successfully implemented utilizing the IMDb data for this study (Support Vector Machine). To complete this study, the preprocessing phase, which consisted of filtering and classifying data using SVM with a total of 50.000 data points, was completed after collecting up to 40.000 reviews to use as training data and 10.000 reviews to use as testing data. 25.000 positive and 25.000 negative points make up the view. In this study, we adopted an evaluation matrix including accurate, precision, recall, and F1-score. According to the experiment report, our model achieved SVM with Bags of Word (BoW) used to get results for the highest accuracy test, which was 88,59% accurate. Then, using grid-search, optimize against the SVM parameters to find the best parameters that SVM models can use. Our model achieved Term Frequency–inverse Document Frequency (TF-IDF) was used to get results for the highest accuracy test, which was 91,27% accurate. \n ","PeriodicalId":52760,"journal":{"name":"Inform Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi","volume":"37 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Inform Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25139/inform.v8i2.5700","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Many researchers currently employ supervised, machine learning methods to study sentiment analysis. Analysis can be done on movie reviews, Twitter reviews, online product reviews, blogs, discussion forums, Myspace comments, and social networks. Support Vector Machines (SVM) classifiers are used to analyze the Twitter data set using different parameters. The analysis and discussion were undertaken to allow for the conclusion that SVM has been successfully implemented utilizing the IMDb data for this study (Support Vector Machine). To complete this study, the preprocessing phase, which consisted of filtering and classifying data using SVM with a total of 50.000 data points, was completed after collecting up to 40.000 reviews to use as training data and 10.000 reviews to use as testing data. 25.000 positive and 25.000 negative points make up the view. In this study, we adopted an evaluation matrix including accurate, precision, recall, and F1-score. According to the experiment report, our model achieved SVM with Bags of Word (BoW) used to get results for the highest accuracy test, which was 88,59% accurate. Then, using grid-search, optimize against the SVM parameters to find the best parameters that SVM models can use. Our model achieved Term Frequency–inverse Document Frequency (TF-IDF) was used to get results for the highest accuracy test, which was 91,27% accurate.  
基于支持向量机的IMDb电影评论情感分析
目前,许多研究人员采用监督式机器学习方法来研究情感分析。分析可以在电影评论、Twitter评论、在线产品评论、博客、讨论论坛、Myspace评论和社交网络上进行。使用支持向量机(SVM)分类器对不同参数下的Twitter数据集进行分析。进行了分析和讨论,以便得出结论,支持向量机已成功地利用本研究的IMDb数据(支持向量机)实施。为了完成本研究,在收集了多达40000条评论作为训练数据和10000条评论作为测试数据后,完成了预处理阶段,即使用SVM对总计50000个数据点的数据进行过滤和分类。25000个正面点和25000个负面点组成了这个观点。在本研究中,我们采用了包括正确率、精密度、召回率和f1评分在内的评价矩阵。根据实验报告,我们的模型使用Word (BoW)袋实现SVM,得到准确率最高的测试结果,准确率为88.59%。然后,利用网格搜索对支持向量机参数进行优化,找到支持向量机模型可以使用的最佳参数。使用术语频率-逆文档频率(TF-IDF)进行最高准确率测试,准确率为91.27%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
31
审稿时长
10 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信