Combining SentiStrength and Multilayer Perceptron in Twitter Sentiment Classification

2019 International Seminar on Intelligent Technology and Its Applications (ISITIA) Pub Date : 2019-08-01 DOI:10.1109/ISITIA.2019.8937134

Eko Yudhi Prastowo, Endroyono, E. M. Yuniarno

{"title":"Combining SentiStrength and Multilayer Perceptron in Twitter Sentiment Classification","authors":"Eko Yudhi Prastowo, Endroyono, E. M. Yuniarno","doi":"10.1109/ISITIA.2019.8937134","DOIUrl":null,"url":null,"abstract":"The advancement of internet technology has caused the use of social media to become the people lifestyle. The company and the government use social media as instant feedback to get user sentiments regarding their comments or reviews. The sentiment is an opinion or view that based on excessive feelings towards something. The method for knowing positive or negative sentiments from someone’s comments can be done manually by humans to analyze comments one by one or automatically by machine learning to do classifications. Machine learning requires training data and test data that have positive and negative labels. Generally, data labeling is done manually by humans. In this study, we used machine learning to classify sentiments with data collected from Twitter. Machine learning method used is Multilayer Perceptron and Naive Bayes as a comparison. Labeling dataset using manual method. For addition training data, labeled data was generated using an English lexicon dictionary called SentiStrength. Feature extraction uses vectorization and TF-IDF. This study aims to measure effect of adding training data generated using SentiStrength from unlabeled data during learning process to accuracy of machine learning model. Classification model testing uses data of 627 tweets. The result is addition of training data to increase average accuracy by 5% of initial accuracy. Multilayer Perceptron is more accurate than Naive Bayes with the highest accuracy ratio of 77.71% and 76.07%.","PeriodicalId":153870,"journal":{"name":"2019 International Seminar on Intelligent Technology and Its Applications (ISITIA)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Seminar on Intelligent Technology and Its Applications (ISITIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISITIA.2019.8937134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The advancement of internet technology has caused the use of social media to become the people lifestyle. The company and the government use social media as instant feedback to get user sentiments regarding their comments or reviews. The sentiment is an opinion or view that based on excessive feelings towards something. The method for knowing positive or negative sentiments from someone’s comments can be done manually by humans to analyze comments one by one or automatically by machine learning to do classifications. Machine learning requires training data and test data that have positive and negative labels. Generally, data labeling is done manually by humans. In this study, we used machine learning to classify sentiments with data collected from Twitter. Machine learning method used is Multilayer Perceptron and Naive Bayes as a comparison. Labeling dataset using manual method. For addition training data, labeled data was generated using an English lexicon dictionary called SentiStrength. Feature extraction uses vectorization and TF-IDF. This study aims to measure effect of adding training data generated using SentiStrength from unlabeled data during learning process to accuracy of machine learning model. Classification model testing uses data of 627 tweets. The result is addition of training data to increase average accuracy by 5% of initial accuracy. Multilayer Perceptron is more accurate than Naive Bayes with the highest accuracy ratio of 77.71% and 76.07%.

查看原文本刊更多论文

结合SentiStrength和多层感知机在Twitter情感分类中的应用

互联网技术的进步使得使用社交媒体成为人们的生活方式。该公司和政府利用社交媒体作为即时反馈，以获取用户对其评论或评论的看法。这种情绪是基于对某事过度的感情而产生的一种意见或观点。从某人的评论中了解积极或消极情绪的方法可以由人类手动完成，逐个分析评论，也可以通过机器学习自动进行分类。机器学习需要有正面和负面标签的训练数据和测试数据。通常，数据标记是由人类手动完成的。在这项研究中，我们使用机器学习对从Twitter收集的数据进行情绪分类。使用的机器学习方法是多层感知机和朴素贝叶斯作为比较。使用手动方法标记数据集。对于附加训练数据，使用名为SentiStrength的英语词典生成标记数据。特征提取采用矢量化和TF-IDF。本研究旨在衡量在学习过程中使用SentiStrength从未标记数据中添加训练数据对机器学习模型准确性的影响。分类模型测试使用627条tweets的数据。结果是增加了训练数据，使平均精度提高了初始精度的5%。多层感知器的准确率最高，分别为77.71%和76.07%，优于朴素贝叶斯。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 International Seminar on Intelligent Technology and Its Applications (ISITIA)

自引率

0.00%

发文量