{"title":"基于Twitter数据的covid-19疫苗情绪分析:一种NLP方法","authors":"Kainat Khan, Sachin Yadav","doi":"10.1109/R10-HTC53172.2021.9641515","DOIUrl":null,"url":null,"abstract":"Sentiment analysis is the process of mining the perception of people towards a service, product, policy or imminent issue from textual data. In this project, tweets relevant to Covid-19 Vaccine are extracted utilizing the Tweepy library. Next, tweet texts are converted into usable form in order to do sentiment analysis. After this, SentiWordNet lexicon is used to label the sentiment of the tweets. Stop words removal, Lemmatizing, stemming operations are also applied on the COVID-19 Vaccine tweets text data. Count Vectorizer and Tfidf Vectorizer are applied for mathematical conversion of the preprocessed text. Then, nine classification techniques namely - Multinomial-NB, Bernoulli-NB, Logistic-Regression, Ridge Classifier, Passive-Aggressive-Classifier, Perceptron, Random Forest classifier, AdaBoostClassifier and Linear SVM are applied on the dataset obtained for sentiment classification and results are obtained in terms of accuracy. The best cross validation test score obtained is 0.785 with Logistic Regression Classifier and TfidfVectorizer.","PeriodicalId":117626,"journal":{"name":"2021 IEEE 9th Region 10 Humanitarian Technology Conference (R10-HTC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sentiment analysis on covid-19 vaccine using Twitter data: A NLP approach\",\"authors\":\"Kainat Khan, Sachin Yadav\",\"doi\":\"10.1109/R10-HTC53172.2021.9641515\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment analysis is the process of mining the perception of people towards a service, product, policy or imminent issue from textual data. In this project, tweets relevant to Covid-19 Vaccine are extracted utilizing the Tweepy library. Next, tweet texts are converted into usable form in order to do sentiment analysis. After this, SentiWordNet lexicon is used to label the sentiment of the tweets. Stop words removal, Lemmatizing, stemming operations are also applied on the COVID-19 Vaccine tweets text data. Count Vectorizer and Tfidf Vectorizer are applied for mathematical conversion of the preprocessed text. Then, nine classification techniques namely - Multinomial-NB, Bernoulli-NB, Logistic-Regression, Ridge Classifier, Passive-Aggressive-Classifier, Perceptron, Random Forest classifier, AdaBoostClassifier and Linear SVM are applied on the dataset obtained for sentiment classification and results are obtained in terms of accuracy. The best cross validation test score obtained is 0.785 with Logistic Regression Classifier and TfidfVectorizer.\",\"PeriodicalId\":117626,\"journal\":{\"name\":\"2021 IEEE 9th Region 10 Humanitarian Technology Conference (R10-HTC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 9th Region 10 Humanitarian Technology Conference (R10-HTC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/R10-HTC53172.2021.9641515\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 9th Region 10 Humanitarian Technology Conference (R10-HTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/R10-HTC53172.2021.9641515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sentiment analysis on covid-19 vaccine using Twitter data: A NLP approach
Sentiment analysis is the process of mining the perception of people towards a service, product, policy or imminent issue from textual data. In this project, tweets relevant to Covid-19 Vaccine are extracted utilizing the Tweepy library. Next, tweet texts are converted into usable form in order to do sentiment analysis. After this, SentiWordNet lexicon is used to label the sentiment of the tweets. Stop words removal, Lemmatizing, stemming operations are also applied on the COVID-19 Vaccine tweets text data. Count Vectorizer and Tfidf Vectorizer are applied for mathematical conversion of the preprocessed text. Then, nine classification techniques namely - Multinomial-NB, Bernoulli-NB, Logistic-Regression, Ridge Classifier, Passive-Aggressive-Classifier, Perceptron, Random Forest classifier, AdaBoostClassifier and Linear SVM are applied on the dataset obtained for sentiment classification and results are obtained in terms of accuracy. The best cross validation test score obtained is 0.785 with Logistic Regression Classifier and TfidfVectorizer.