{"title":"COVID-19 Sentiment Analysis using K-Means and DBSCAN","authors":"Smitesh D. Patravali, D. S. P. Algur","doi":"10.35940/ijese.l2558.11111223","DOIUrl":null,"url":null,"abstract":"The analysis of sentiment towards COVID-19 plays a crucial role in understanding public opinion. This research paper proposes sentiment analysis using K-means and DBSCAN clustering algorithms on the dataset of tweets related to COVID-19. Pre-processing and extraction of features is carried out using Term Frequency-Inverse Document Frequency (Tf-idf) to capture the weight of words in the dataset. K-means clustering is explored to group similar sentiments together, enabling the identification of sentiment clusters related to COVID-19. The DBSCAN algorithm is then employed to identify outliers and noise in the sentiment clusters. The evaluation metrics considered were accuracy, recall, F1-score, and precision. It was observed that DBSCAN was more effective in identifying underlying patterns in the data more accurately.","PeriodicalId":275796,"journal":{"name":"International Journal of Emerging Science and Engineering","volume":"148 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Emerging Science and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35940/ijese.l2558.11111223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The analysis of sentiment towards COVID-19 plays a crucial role in understanding public opinion. This research paper proposes sentiment analysis using K-means and DBSCAN clustering algorithms on the dataset of tweets related to COVID-19. Pre-processing and extraction of features is carried out using Term Frequency-Inverse Document Frequency (Tf-idf) to capture the weight of words in the dataset. K-means clustering is explored to group similar sentiments together, enabling the identification of sentiment clusters related to COVID-19. The DBSCAN algorithm is then employed to identify outliers and noise in the sentiment clusters. The evaluation metrics considered were accuracy, recall, F1-score, and precision. It was observed that DBSCAN was more effective in identifying underlying patterns in the data more accurately.