基于分类的Twitter数据情感分析框架

2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC) Pub Date : 2018-12-01 DOI:10.1109/PDGC.2018.8745748

Medha Khurana, Anurag Gulati, Saurabh Singh

{"title":"基于分类的Twitter数据情感分析框架","authors":"Medha Khurana, Anurag Gulati, Saurabh Singh","doi":"10.1109/PDGC.2018.8745748","DOIUrl":null,"url":null,"abstract":"Text mining is the way toward investigating and breaking down a lot of unstructured content information that can distinguish ideas, designs, subjects, catchphrases and different qualities in the information. Twitter is one of those forums that allow people across the world to put and exchange their views and ideas on several major and minor issues which are revolving around the world every day. Microblogging on twitter gains the interest of data researchers as there is an immense scope of mining and analysing the huge amount of unstructured data in several ways. In this paper, various algorithms for analysing the sentiments of the tweets have been discussed. Further, the performance of these algorithms has been compared based on certain metrics. Certain challenges while doing the study have also been described in terms of improvement and future scope. Since the machine learning algorithms have been performed on an unexplored dataset, language barriers to these algorithms have also been identified in terms of future scope and current feasibility of the algorithms. The analysis has been performed using classification algorithms - Naïve Bayes, Support Vector Machine and Random Forest. This experimental work has been executed in python and excel has been used to further evaluate and plot some of the results. Since the sentiment of the tweets cannot be beknown, test set has been manually prepared in order to prevent any errors in evaluating accuracy and precision of the models.","PeriodicalId":303401,"journal":{"name":"2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Sentiment Analysis Framework of Twitter Data Using Classification\",\"authors\":\"Medha Khurana, Anurag Gulati, Saurabh Singh\",\"doi\":\"10.1109/PDGC.2018.8745748\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text mining is the way toward investigating and breaking down a lot of unstructured content information that can distinguish ideas, designs, subjects, catchphrases and different qualities in the information. Twitter is one of those forums that allow people across the world to put and exchange their views and ideas on several major and minor issues which are revolving around the world every day. Microblogging on twitter gains the interest of data researchers as there is an immense scope of mining and analysing the huge amount of unstructured data in several ways. In this paper, various algorithms for analysing the sentiments of the tweets have been discussed. Further, the performance of these algorithms has been compared based on certain metrics. Certain challenges while doing the study have also been described in terms of improvement and future scope. Since the machine learning algorithms have been performed on an unexplored dataset, language barriers to these algorithms have also been identified in terms of future scope and current feasibility of the algorithms. The analysis has been performed using classification algorithms - Naïve Bayes, Support Vector Machine and Random Forest. This experimental work has been executed in python and excel has been used to further evaluate and plot some of the results. Since the sentiment of the tweets cannot be beknown, test set has been manually prepared in order to prevent any errors in evaluating accuracy and precision of the models.\",\"PeriodicalId\":303401,\"journal\":{\"name\":\"2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDGC.2018.8745748\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDGC.2018.8745748","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

文本挖掘是一种研究和分解大量非结构化内容信息的方法，可以区分信息中的想法、设计、主题、流行语和不同的品质。Twitter是一个论坛，它允许世界各地的人们就每天在世界各地发生的一些重大和次要问题发表和交换他们的观点和想法。twitter上的微博引起了数据研究人员的兴趣，因为它可以通过多种方式挖掘和分析大量的非结构化数据。本文讨论了分析推文情感的各种算法。此外，基于某些指标对这些算法的性能进行了比较。在进行这项研究时，还描述了改进和未来范围方面的某些挑战。由于机器学习算法是在未探索的数据集上执行的，因此这些算法的语言障碍也被确定为算法的未来范围和当前可行性。分析已执行使用分类算法- Naïve贝叶斯，支持向量机和随机森林。该实验工作已在python中执行，并使用excel进一步评估和绘制一些结果。由于tweets的情绪是不可知的，所以测试集是手工准备的，以防止在评估模型的准确性和精度时出现任何错误。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sentiment Analysis Framework of Twitter Data Using Classification

Text mining is the way toward investigating and breaking down a lot of unstructured content information that can distinguish ideas, designs, subjects, catchphrases and different qualities in the information. Twitter is one of those forums that allow people across the world to put and exchange their views and ideas on several major and minor issues which are revolving around the world every day. Microblogging on twitter gains the interest of data researchers as there is an immense scope of mining and analysing the huge amount of unstructured data in several ways. In this paper, various algorithms for analysing the sentiments of the tweets have been discussed. Further, the performance of these algorithms has been compared based on certain metrics. Certain challenges while doing the study have also been described in terms of improvement and future scope. Since the machine learning algorithms have been performed on an unexplored dataset, language barriers to these algorithms have also been identified in terms of future scope and current feasibility of the algorithms. The analysis has been performed using classification algorithms - Naïve Bayes, Support Vector Machine and Random Forest. This experimental work has been executed in python and excel has been used to further evaluate and plot some of the results. Since the sentiment of the tweets cannot be beknown, test set has been manually prepared in order to prevent any errors in evaluating accuracy and precision of the models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC)

自引率

0.00%

发文量