Non-linear Kernel Optimisation of Support Vector Machine Algorithm for Online Marketplace Sentiment Analysis

A. Fadlil, Imam Riadi, Fiki Andrianto
{"title":"Non-linear Kernel Optimisation of Support Vector Machine Algorithm for Online Marketplace Sentiment Analysis","authors":"A. Fadlil, Imam Riadi, Fiki Andrianto","doi":"10.30595/juita.v12i1.19798","DOIUrl":null,"url":null,"abstract":"Twitter is a social media platform that is very important in the digital world. Fast communication and interaction make Twitter a vital information center in sentiment analysis. The purpose of this research is to classify public opinion about the presence of marketplaces in Indonesia, both positive and negative sentiments, using a Non-linear SVM algorithm based on 1276 tweets. This research involves the stages of data pre-processing, labeling, feature extraction using TF-IDF, and data division into three scenarios: 80% training data and 20% test data, 50% training data and 50% test data scenario, and 20% training data and 80% test data scenario. The last process, GridSearchCV, combines cross-validation and non-linear SVM parameters for model evaluation using a confusion matrix. The best SVM model resulting from the scenario was 80% training and 20% test data, with hyperparameters Gamma = 100 and C = 0.01, achieving 89% accuracy. When tested on never-before-seen data, the accuracy increased to 90%, with an f1-score of 91%, precision of 88%, and recall of 95% on negative sentiments. In conclusion, evaluating the performance of non-linear SVM kernels with a combination of hyperparameter values can improve accuracy, especially on public response information about online marketplaces and public sentiment.","PeriodicalId":151254,"journal":{"name":"JUITA : Jurnal Informatika","volume":"99 18","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JUITA : Jurnal Informatika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30595/juita.v12i1.19798","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Twitter is a social media platform that is very important in the digital world. Fast communication and interaction make Twitter a vital information center in sentiment analysis. The purpose of this research is to classify public opinion about the presence of marketplaces in Indonesia, both positive and negative sentiments, using a Non-linear SVM algorithm based on 1276 tweets. This research involves the stages of data pre-processing, labeling, feature extraction using TF-IDF, and data division into three scenarios: 80% training data and 20% test data, 50% training data and 50% test data scenario, and 20% training data and 80% test data scenario. The last process, GridSearchCV, combines cross-validation and non-linear SVM parameters for model evaluation using a confusion matrix. The best SVM model resulting from the scenario was 80% training and 20% test data, with hyperparameters Gamma = 100 and C = 0.01, achieving 89% accuracy. When tested on never-before-seen data, the accuracy increased to 90%, with an f1-score of 91%, precision of 88%, and recall of 95% on negative sentiments. In conclusion, evaluating the performance of non-linear SVM kernels with a combination of hyperparameter values can improve accuracy, especially on public response information about online marketplaces and public sentiment.
用于在线市场情感分析的支持向量机算法的非线性核优化
Twitter 是数字世界中非常重要的社交媒体平台。快速的交流和互动使 Twitter 成为情感分析的重要信息中心。本研究的目的是基于 1276 条推文,使用非线性 SVM 算法对公众对印尼市场存在的正面和负面情绪进行分类。本研究包括数据预处理、标记、使用 TF-IDF 进行特征提取以及将数据分为三种情况等阶段:80% 的训练数据和 20% 的测试数据,50% 的训练数据和 50% 的测试数据,以及 20% 的训练数据和 80% 的测试数据。最后一个流程 GridSearchCV 结合了交叉验证和非线性 SVM 参数,使用混淆矩阵对模型进行评估。该场景下产生的最佳 SVM 模型是 80% 的训练数据和 20% 的测试数据,超参数 Gamma = 100 和 C = 0.01,准确率达到 89%。在从未见过的数据上进行测试时,准确率提高到 90%,f1 分数为 91%,精确度为 88%,负面情绪的召回率为 95%。总之,利用超参数值组合来评估非线性 SVM 内核的性能可以提高准确率,尤其是在有关在线市场和公众情绪的公共响应信息方面。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信