运用多项逻辑回归进行情感分析

2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC) Pub Date : 2017-09-01 DOI:10.1109/ICCEREC.2017.8226700

W. Ramadhan, S. Novianty, S. Setianingsih

{"title":"运用多项逻辑回归进行情感分析","authors":"W. Ramadhan, S. Novianty, S. Setianingsih","doi":"10.1109/ICCEREC.2017.8226700","DOIUrl":null,"url":null,"abstract":"Data amount becomes rapidly increased in today's era. Data can be in form of text, picture, voice, and video. Social media is one factor of the data increase as everybody expresses, gives opinion, and even complains in social media. The first step is data collection used API twitter with each candidate names on Jakarta Governor Election. The collected data then became input for preprocessing step. The next step is extracted-each tweet's feature to be listed. The list of features were transformed into feature vector in binary form and transformed again used Tf-idf method. Dataset consists of two kinds of data, training and testing. Training was labeled manually. K-Fold Cross Validation is used to test algorithm performance. Based on the result of the test, accuracy obtained reached 74% in average with composition of training data and testing data by 90:10. Changed folding amount gave no impact to the accuracy level.","PeriodicalId":328054,"journal":{"name":"2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"55","resultStr":"{\"title\":\"Sentiment analysis using multinomial logistic regression\",\"authors\":\"W. Ramadhan, S. Novianty, S. Setianingsih\",\"doi\":\"10.1109/ICCEREC.2017.8226700\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data amount becomes rapidly increased in today's era. Data can be in form of text, picture, voice, and video. Social media is one factor of the data increase as everybody expresses, gives opinion, and even complains in social media. The first step is data collection used API twitter with each candidate names on Jakarta Governor Election. The collected data then became input for preprocessing step. The next step is extracted-each tweet's feature to be listed. The list of features were transformed into feature vector in binary form and transformed again used Tf-idf method. Dataset consists of two kinds of data, training and testing. Training was labeled manually. K-Fold Cross Validation is used to test algorithm performance. Based on the result of the test, accuracy obtained reached 74% in average with composition of training data and testing data by 90:10. Changed folding amount gave no impact to the accuracy level.\",\"PeriodicalId\":328054,\"journal\":{\"name\":\"2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC)\",\"volume\":\"102 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"55\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCEREC.2017.8226700\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCEREC.2017.8226700","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 55

摘要

在当今时代，数据量迅速增加。数据可以是文本、图片、语音和视频。社交媒体是数据增长的一个因素，因为每个人都在社交媒体上表达、发表意见，甚至抱怨。第一步是使用API twitter收集雅加达省长选举中每个候选人的名字。采集到的数据成为预处理的输入。下一步是提取——列出每条tweet的特征。将特征列表转换成二进制形式的特征向量，再用Tf-idf方法进行变换。数据集包括两类数据:训练数据和测试数据。训练是手动标记的。K-Fold交叉验证用于测试算法的性能。根据测试结果，训练数据和测试数据按90:10的比例组成，得到的准确率平均达到74%。改变折叠量对精度水平没有影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sentiment analysis using multinomial logistic regression

Data amount becomes rapidly increased in today's era. Data can be in form of text, picture, voice, and video. Social media is one factor of the data increase as everybody expresses, gives opinion, and even complains in social media. The first step is data collection used API twitter with each candidate names on Jakarta Governor Election. The collected data then became input for preprocessing step. The next step is extracted-each tweet's feature to be listed. The list of features were transformed into feature vector in binary form and transformed again used Tf-idf method. Dataset consists of two kinds of data, training and testing. Training was labeled manually. K-Fold Cross Validation is used to test algorithm performance. Based on the result of the test, accuracy obtained reached 74% in average with composition of training data and testing data by 90:10. Changed folding amount gave no impact to the accuracy level.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC)

自引率

0.00%

发文量