运用多项逻辑回归进行情感分析

W. Ramadhan, S. Novianty, S. Setianingsih
{"title":"运用多项逻辑回归进行情感分析","authors":"W. Ramadhan, S. Novianty, S. Setianingsih","doi":"10.1109/ICCEREC.2017.8226700","DOIUrl":null,"url":null,"abstract":"Data amount becomes rapidly increased in today's era. Data can be in form of text, picture, voice, and video. Social media is one factor of the data increase as everybody expresses, gives opinion, and even complains in social media. The first step is data collection used API twitter with each candidate names on Jakarta Governor Election. The collected data then became input for preprocessing step. The next step is extracted-each tweet's feature to be listed. The list of features were transformed into feature vector in binary form and transformed again used Tf-idf method. Dataset consists of two kinds of data, training and testing. Training was labeled manually. K-Fold Cross Validation is used to test algorithm performance. Based on the result of the test, accuracy obtained reached 74% in average with composition of training data and testing data by 90:10. Changed folding amount gave no impact to the accuracy level.","PeriodicalId":328054,"journal":{"name":"2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"55","resultStr":"{\"title\":\"Sentiment analysis using multinomial logistic regression\",\"authors\":\"W. Ramadhan, S. Novianty, S. Setianingsih\",\"doi\":\"10.1109/ICCEREC.2017.8226700\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data amount becomes rapidly increased in today's era. Data can be in form of text, picture, voice, and video. Social media is one factor of the data increase as everybody expresses, gives opinion, and even complains in social media. The first step is data collection used API twitter with each candidate names on Jakarta Governor Election. The collected data then became input for preprocessing step. The next step is extracted-each tweet's feature to be listed. The list of features were transformed into feature vector in binary form and transformed again used Tf-idf method. Dataset consists of two kinds of data, training and testing. Training was labeled manually. K-Fold Cross Validation is used to test algorithm performance. Based on the result of the test, accuracy obtained reached 74% in average with composition of training data and testing data by 90:10. Changed folding amount gave no impact to the accuracy level.\",\"PeriodicalId\":328054,\"journal\":{\"name\":\"2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC)\",\"volume\":\"102 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"55\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCEREC.2017.8226700\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCEREC.2017.8226700","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 55

摘要

在当今时代,数据量迅速增加。数据可以是文本、图片、语音和视频。社交媒体是数据增长的一个因素,因为每个人都在社交媒体上表达、发表意见,甚至抱怨。第一步是使用API twitter收集雅加达省长选举中每个候选人的名字。采集到的数据成为预处理的输入。下一步是提取——列出每条tweet的特征。将特征列表转换成二进制形式的特征向量,再用Tf-idf方法进行变换。数据集包括两类数据:训练数据和测试数据。训练是手动标记的。K-Fold交叉验证用于测试算法的性能。根据测试结果,训练数据和测试数据按90:10的比例组成,得到的准确率平均达到74%。改变折叠量对精度水平没有影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Sentiment analysis using multinomial logistic regression
Data amount becomes rapidly increased in today's era. Data can be in form of text, picture, voice, and video. Social media is one factor of the data increase as everybody expresses, gives opinion, and even complains in social media. The first step is data collection used API twitter with each candidate names on Jakarta Governor Election. The collected data then became input for preprocessing step. The next step is extracted-each tweet's feature to be listed. The list of features were transformed into feature vector in binary form and transformed again used Tf-idf method. Dataset consists of two kinds of data, training and testing. Training was labeled manually. K-Fold Cross Validation is used to test algorithm performance. Based on the result of the test, accuracy obtained reached 74% in average with composition of training data and testing data by 90:10. Changed folding amount gave no impact to the accuracy level.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信