{"title":"基于分类的Twitter数据情感分析框架","authors":"Medha Khurana, Anurag Gulati, Saurabh Singh","doi":"10.1109/PDGC.2018.8745748","DOIUrl":null,"url":null,"abstract":"Text mining is the way toward investigating and breaking down a lot of unstructured content information that can distinguish ideas, designs, subjects, catchphrases and different qualities in the information. Twitter is one of those forums that allow people across the world to put and exchange their views and ideas on several major and minor issues which are revolving around the world every day. Microblogging on twitter gains the interest of data researchers as there is an immense scope of mining and analysing the huge amount of unstructured data in several ways. In this paper, various algorithms for analysing the sentiments of the tweets have been discussed. Further, the performance of these algorithms has been compared based on certain metrics. Certain challenges while doing the study have also been described in terms of improvement and future scope. Since the machine learning algorithms have been performed on an unexplored dataset, language barriers to these algorithms have also been identified in terms of future scope and current feasibility of the algorithms. The analysis has been performed using classification algorithms - Naïve Bayes, Support Vector Machine and Random Forest. This experimental work has been executed in python and excel has been used to further evaluate and plot some of the results. Since the sentiment of the tweets cannot be beknown, test set has been manually prepared in order to prevent any errors in evaluating accuracy and precision of the models.","PeriodicalId":303401,"journal":{"name":"2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Sentiment Analysis Framework of Twitter Data Using Classification\",\"authors\":\"Medha Khurana, Anurag Gulati, Saurabh Singh\",\"doi\":\"10.1109/PDGC.2018.8745748\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text mining is the way toward investigating and breaking down a lot of unstructured content information that can distinguish ideas, designs, subjects, catchphrases and different qualities in the information. Twitter is one of those forums that allow people across the world to put and exchange their views and ideas on several major and minor issues which are revolving around the world every day. Microblogging on twitter gains the interest of data researchers as there is an immense scope of mining and analysing the huge amount of unstructured data in several ways. In this paper, various algorithms for analysing the sentiments of the tweets have been discussed. Further, the performance of these algorithms has been compared based on certain metrics. Certain challenges while doing the study have also been described in terms of improvement and future scope. Since the machine learning algorithms have been performed on an unexplored dataset, language barriers to these algorithms have also been identified in terms of future scope and current feasibility of the algorithms. The analysis has been performed using classification algorithms - Naïve Bayes, Support Vector Machine and Random Forest. This experimental work has been executed in python and excel has been used to further evaluate and plot some of the results. Since the sentiment of the tweets cannot be beknown, test set has been manually prepared in order to prevent any errors in evaluating accuracy and precision of the models.\",\"PeriodicalId\":303401,\"journal\":{\"name\":\"2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDGC.2018.8745748\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDGC.2018.8745748","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sentiment Analysis Framework of Twitter Data Using Classification
Text mining is the way toward investigating and breaking down a lot of unstructured content information that can distinguish ideas, designs, subjects, catchphrases and different qualities in the information. Twitter is one of those forums that allow people across the world to put and exchange their views and ideas on several major and minor issues which are revolving around the world every day. Microblogging on twitter gains the interest of data researchers as there is an immense scope of mining and analysing the huge amount of unstructured data in several ways. In this paper, various algorithms for analysing the sentiments of the tweets have been discussed. Further, the performance of these algorithms has been compared based on certain metrics. Certain challenges while doing the study have also been described in terms of improvement and future scope. Since the machine learning algorithms have been performed on an unexplored dataset, language barriers to these algorithms have also been identified in terms of future scope and current feasibility of the algorithms. The analysis has been performed using classification algorithms - Naïve Bayes, Support Vector Machine and Random Forest. This experimental work has been executed in python and excel has been used to further evaluate and plot some of the results. Since the sentiment of the tweets cannot be beknown, test set has been manually prepared in order to prevent any errors in evaluating accuracy and precision of the models.