Sentiment Analysis for IMDb Movie Review Using Support Vector Machine (SVM) Method

Inform Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi Pub Date : 2023-03-18 DOI:10.25139/inform.v8i2.5700

D. D. Nur Cahyo, F. Farasalsabila, Verra Budhi Lestari, Hanafi, Tutik Lestari, Fahmi Rusdi Al Islami, M. A. Maulana

{"title":"Sentiment Analysis for IMDb Movie Review Using Support Vector Machine (SVM) Method","authors":"D. D. Nur Cahyo, F. Farasalsabila, Verra Budhi Lestari, Hanafi, Tutik Lestari, Fahmi Rusdi Al Islami, M. A. Maulana","doi":"10.25139/inform.v8i2.5700","DOIUrl":null,"url":null,"abstract":"Many researchers currently employ supervised, machine learning methods to study sentiment analysis. Analysis can be done on movie reviews, Twitter reviews, online product reviews, blogs, discussion forums, Myspace comments, and social networks. Support Vector Machines (SVM) classifiers are used to analyze the Twitter data set using different parameters. The analysis and discussion were undertaken to allow for the conclusion that SVM has been successfully implemented utilizing the IMDb data for this study (Support Vector Machine). To complete this study, the preprocessing phase, which consisted of filtering and classifying data using SVM with a total of 50.000 data points, was completed after collecting up to 40.000 reviews to use as training data and 10.000 reviews to use as testing data. 25.000 positive and 25.000 negative points make up the view. In this study, we adopted an evaluation matrix including accurate, precision, recall, and F1-score. According to the experiment report, our model achieved SVM with Bags of Word (BoW) used to get results for the highest accuracy test, which was 88,59% accurate. Then, using grid-search, optimize against the SVM parameters to find the best parameters that SVM models can use. Our model achieved Term Frequency–inverse Document Frequency (TF-IDF) was used to get results for the highest accuracy test, which was 91,27% accurate. \n ","PeriodicalId":52760,"journal":{"name":"Inform Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi","volume":"37 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Inform Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25139/inform.v8i2.5700","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Many researchers currently employ supervised, machine learning methods to study sentiment analysis. Analysis can be done on movie reviews, Twitter reviews, online product reviews, blogs, discussion forums, Myspace comments, and social networks. Support Vector Machines (SVM) classifiers are used to analyze the Twitter data set using different parameters. The analysis and discussion were undertaken to allow for the conclusion that SVM has been successfully implemented utilizing the IMDb data for this study (Support Vector Machine). To complete this study, the preprocessing phase, which consisted of filtering and classifying data using SVM with a total of 50.000 data points, was completed after collecting up to 40.000 reviews to use as training data and 10.000 reviews to use as testing data. 25.000 positive and 25.000 negative points make up the view. In this study, we adopted an evaluation matrix including accurate, precision, recall, and F1-score. According to the experiment report, our model achieved SVM with Bags of Word (BoW) used to get results for the highest accuracy test, which was 88,59% accurate. Then, using grid-search, optimize against the SVM parameters to find the best parameters that SVM models can use. Our model achieved Term Frequency–inverse Document Frequency (TF-IDF) was used to get results for the highest accuracy test, which was 91,27% accurate.

查看原文本刊更多论文

基于支持向量机的IMDb电影评论情感分析

目前，许多研究人员采用监督式机器学习方法来研究情感分析。分析可以在电影评论、Twitter评论、在线产品评论、博客、讨论论坛、Myspace评论和社交网络上进行。使用支持向量机(SVM)分类器对不同参数下的Twitter数据集进行分析。进行了分析和讨论，以便得出结论，支持向量机已成功地利用本研究的IMDb数据(支持向量机)实施。为了完成本研究，在收集了多达40000条评论作为训练数据和10000条评论作为测试数据后，完成了预处理阶段，即使用SVM对总计50000个数据点的数据进行过滤和分类。25000个正面点和25000个负面点组成了这个观点。在本研究中，我们采用了包括正确率、精密度、召回率和f1评分在内的评价矩阵。根据实验报告，我们的模型使用Word (BoW)袋实现SVM，得到准确率最高的测试结果，准确率为88.59%。然后，利用网格搜索对支持向量机参数进行优化，找到支持向量机模型可以使用的最佳参数。使用术语频率-逆文档频率(TF-IDF)进行最高准确率测试，准确率为91.27%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Inform Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi

自引率

0.00%

发文量

审稿时长

10 weeks