Sentiment Analysis on COVID Tweets: An Experimental Analysis on the Impact of Count Vectorizer and TF-IDF on Sentiment Predictions using Deep Learning Models
G. Raza, Zainab Saeed Butt, Seemab Latif, Abdul Wahid
{"title":"Sentiment Analysis on COVID Tweets: An Experimental Analysis on the Impact of Count Vectorizer and TF-IDF on Sentiment Predictions using Deep Learning Models","authors":"G. Raza, Zainab Saeed Butt, Seemab Latif, Abdul Wahid","doi":"10.1109/ICoDT252288.2021.9441508","DOIUrl":null,"url":null,"abstract":"Due to the higher popularity of social media and its excessive use, COVID-19 has become the topic of the talk since 2019 and it has become a cause of stress, anxiety and depression for people around the world. In this article, we experimented with different classifiers on COVID data to train deep neural networks to enhance the accuracy rate using two popular word embedding techniques: Count Vectorizer and Term Frequency-Inverse Document Frequency. Finally, we compare accuracies and observe that TF-IDF comes out to be more efficient as compared to Count Vectorizer where datasets are of huge volume and in our case i.e., for covid19 tweets, both vectorizers have been approximately similar in performance except on Single Layer Perceptron where Count Vectorizer results in 10% more efficiency in terms of accuracy.","PeriodicalId":207832,"journal":{"name":"2021 International Conference on Digital Futures and Transformative Technologies (ICoDT2)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Digital Futures and Transformative Technologies (ICoDT2)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICoDT252288.2021.9441508","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17
Abstract
Due to the higher popularity of social media and its excessive use, COVID-19 has become the topic of the talk since 2019 and it has become a cause of stress, anxiety and depression for people around the world. In this article, we experimented with different classifiers on COVID data to train deep neural networks to enhance the accuracy rate using two popular word embedding techniques: Count Vectorizer and Term Frequency-Inverse Document Frequency. Finally, we compare accuracies and observe that TF-IDF comes out to be more efficient as compared to Count Vectorizer where datasets are of huge volume and in our case i.e., for covid19 tweets, both vectorizers have been approximately similar in performance except on Single Layer Perceptron where Count Vectorizer results in 10% more efficiency in terms of accuracy.