A Sentiment Analysis Framework on COVID-19 in Major Cities of Malaysia based on Tweets using Machine Learning Classification Model

2021 IEEE 11th International Conference on System Engineering and Technology (ICSET) Pub Date : 2021-11-06 DOI:10.1109/ICSET53708.2021.9612527

Raihah Aminuddin, Muhammad Akmal Bistamam, Shafaf Ibrahim, Nur Nabilah Abu Mangshor, S. Fesol, Normilah Wahab

{"title":"A Sentiment Analysis Framework on COVID-19 in Major Cities of Malaysia based on Tweets using Machine Learning Classification Model","authors":"Raihah Aminuddin, Muhammad Akmal Bistamam, Shafaf Ibrahim, Nur Nabilah Abu Mangshor, S. Fesol, Normilah Wahab","doi":"10.1109/ICSET53708.2021.9612527","DOIUrl":null,"url":null,"abstract":"Twitter is one of the famous social media platforms for people to share their stories and opinions on any situations, such as the COVID-19 pandemic. With the indirect influence of tweets on users and the rise in cases of COVID-19 in Malaysia, it is important to monitor information related to the pandemic in order to avoid misinformation, panic, or confusion among public. As the data from tweets are also one of the useful raw data sources that can be used for data visualization, this project aims to design and develop a web-based system for visualizing the status of pandemic in Malaysia based on the data collected from Twitter. There are four phases in the methodology of this project: (i) Planning, (ii) Analysis, (iii) Design and Development, and (iv) Testing and Documentation. In the planning and analysis phases, the data will be collected from March 2020 to March 2021 and will be filtered by using keywords and hashtags, such as #COVID19 and #Coronavirus, as well as the location tagged on the tweets. The collected data then will be pre-processed to remove any unwanted texts. The classification of the data is based on sentiment analysis using one of machine learning models that is Support Vector Machine (SVM). The performance of the classification model will be evaluated using the evaluation model: (i) accuracy, (ii) recall, (iii) precision, and (iv) F1-measure. The final output of this project is the data visualization of the sentiment analysis on COVID-19 in Malaysia based on two of its major cities: Kuala Lumpur and Klang.","PeriodicalId":433197,"journal":{"name":"2021 IEEE 11th International Conference on System Engineering and Technology (ICSET)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 11th International Conference on System Engineering and Technology (ICSET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSET53708.2021.9612527","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Twitter is one of the famous social media platforms for people to share their stories and opinions on any situations, such as the COVID-19 pandemic. With the indirect influence of tweets on users and the rise in cases of COVID-19 in Malaysia, it is important to monitor information related to the pandemic in order to avoid misinformation, panic, or confusion among public. As the data from tweets are also one of the useful raw data sources that can be used for data visualization, this project aims to design and develop a web-based system for visualizing the status of pandemic in Malaysia based on the data collected from Twitter. There are four phases in the methodology of this project: (i) Planning, (ii) Analysis, (iii) Design and Development, and (iv) Testing and Documentation. In the planning and analysis phases, the data will be collected from March 2020 to March 2021 and will be filtered by using keywords and hashtags, such as #COVID19 and #Coronavirus, as well as the location tagged on the tweets. The collected data then will be pre-processed to remove any unwanted texts. The classification of the data is based on sentiment analysis using one of machine learning models that is Support Vector Machine (SVM). The performance of the classification model will be evaluated using the evaluation model: (i) accuracy, (ii) recall, (iii) precision, and (iv) F1-measure. The final output of this project is the data visualization of the sentiment analysis on COVID-19 in Malaysia based on two of its major cities: Kuala Lumpur and Klang.

查看原文本刊更多论文

基于机器学习分类模型的马来西亚主要城市COVID-19情绪分析框架

推特是著名的社交媒体平台之一，人们可以在任何情况下分享自己的故事和观点，例如COVID-19大流行。随着推文对用户的间接影响和马来西亚新冠肺炎病例的增加，为避免公众的错误信息、恐慌或混乱，监测与大流行有关的信息非常重要。由于来自Twitter的数据也是可用于数据可视化的有用原始数据源之一，因此本项目旨在设计和开发一个基于web的系统，以根据从Twitter收集的数据可视化马来西亚的流行病状况。这个项目的方法分为四个阶段:(一)规划，(二)分析，(三)设计和开发，(四)测试和文件编制。在规划和分析阶段，数据将在2020年3月至2021年3月期间收集，并通过关键词和标签(如# covid - 19和#冠状病毒)以及推特上标记的位置进行过滤。然后将对收集到的数据进行预处理，以删除任何不需要的文本。数据的分类基于情感分析，使用一种机器学习模型，即支持向量机(SVM)。将使用评估模型对分类模型的性能进行评估:(i)准确性，(ii)召回率，(iii)精度和(iv) F1-measure。该项目的最终成果是基于马来西亚两个主要城市:吉隆坡和巴生的COVID-19情绪分析的数据可视化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 11th International Conference on System Engineering and Technology (ICSET)

自引率

0.00%

发文量