Dwi Intan Af’idah, Dairoh Dairoh, Sharfina Febbi Handayani, Riszki Wijayatun Pratiwi
{"title":"Pengaruh Parameter Word2Vec terhadap Performa Deep Learning pada Klasifikasi Sentimen","authors":"Dwi Intan Af’idah, Dairoh Dairoh, Sharfina Febbi Handayani, Riszki Wijayatun Pratiwi","doi":"10.30591/JPIT.V6I3.3016","DOIUrl":null,"url":null,"abstract":"The difficulty of sentiment classification on this big data can be overcome using deep learning. Before the deep learning training and testing process is carried out, a word features extraction process is needed. Word2Vec as a word features extraction is often used in sentiment classification pre-training because it can capture the semantic meaning of the text by representing a similar vector for each word that has a close meaning. Word2Vec has three parameters that affect the model learning process namely architecture, evaluation method, and dimensions. This study aims to determine the effect of each Word2Vec parameter on deep learning performance in sentiment classification. The accuracy results of the deep learning model were evaluated to determine the effect of the Word2Vec parameter. The results of this study indicate that the three Word2Vec parameters have an influence on the performance of the deep learning model in sentiment classification. The combination of Word2Vec parameters that produces the highest average accuracy include CBOW (Continuous Bag of Word) architecture, Hierarchical Softmax evaluation method, and a dimension of 100. CBOW produces better performance, because it has slightly better accuracy for words that often appear and in this research dataset there are many words that often appear. Hierarchical Softmax shows better results because it uses a binary tree model which makes words that occur rarely will inherit the vector representation above them. The dimension with a value of 100 produces better accuracy because it is in line with the number of datasets of 10,000 reviews.  ","PeriodicalId":53375,"journal":{"name":"Jurnal Informatika Jurnal Pengembangan IT","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Informatika Jurnal Pengembangan IT","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30591/JPIT.V6I3.3016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

使用深度学习可以克服在这种大数据上进行情绪分类的困难。在进行深度学习训练和测试过程之前,需要进行单词特征提取过程。Word2Vec作为一个词的特征提取通常用于情感分类预训练,因为它可以通过为每个具有相近含义的词表示相似的向量来捕获文本的语义。Word2Vec有三个影响模型学习过程的参数,即架构、评估方法和维度。本研究旨在确定每个Word2Vec参数对情绪分类中深度学习表现的影响。对深度学习模型的准确性结果进行了评估,以确定Word2Vec参数的影响。本研究的结果表明,Word2Vec的三个参数对深度学习模型在情感分类中的性能有影响。产生最高平均精度的Word2Verc参数的组合包括CBOW(单词的连续袋)架构、分层Softmax评估方法和100的维度。CBOW产生了更好的性能,因为它对经常出现的单词有更好的准确性,而且在这个研究数据集中有很多经常出现的词。分层Softmax显示出更好的结果,因为它使用了二叉树模型,使得很少出现的单词将继承其上面的向量表示。值为100的维度产生了更好的准确性,因为它与10000条评论的数据集数量一致。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Pengaruh Parameter Word2Vec terhadap Performa Deep Learning pada Klasifikasi Sentimen
The difficulty of sentiment classification on this big data can be overcome using deep learning. Before the deep learning training and testing process is carried out, a word features extraction process is needed. Word2Vec as a word features extraction is often used in sentiment classification pre-training because it can capture the semantic meaning of the text by representing a similar vector for each word that has a close meaning. Word2Vec has three parameters that affect the model learning process namely architecture, evaluation method, and dimensions. This study aims to determine the effect of each Word2Vec parameter on deep learning performance in sentiment classification. The accuracy results of the deep learning model were evaluated to determine the effect of the Word2Vec parameter. The results of this study indicate that the three Word2Vec parameters have an influence on the performance of the deep learning model in sentiment classification. The combination of Word2Vec parameters that produces the highest average accuracy include CBOW (Continuous Bag of Word) architecture, Hierarchical Softmax evaluation method, and a dimension of 100. CBOW produces better performance, because it has slightly better accuracy for words that often appear and in this research dataset there are many words that often appear. Hierarchical Softmax shows better results because it uses a binary tree model which makes words that occur rarely will inherit the vector representation above them. The dimension with a value of 100 produces better accuracy because it is in line with the number of datasets of 10,000 reviews.  
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
24 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信