On Comparative Classification of Relevant Covid-19 Tweets

2021 6th International Conference on Computer Science and Engineering (UBMK) Pub Date : 2021-09-15 DOI:10.1109/UBMK52708.2021.9558945

Gokhan Bakal, Orhan Abar

{"title":"On Comparative Classification of Relevant Covid-19 Tweets","authors":"Gokhan Bakal, Orhan Abar","doi":"10.1109/UBMK52708.2021.9558945","DOIUrl":null,"url":null,"abstract":"Due to the impressive information dissemination power of social networks such as Twitter, people tend to check social networks and Web pages more than other traditional news sources, including newspapers, TV news programs, or radio channels. In that sense, the information carried by the content of the shared social media posts becomes much more considerable. However, most of the posts are commonly either irrelevant or inaccurate. Besides, the more critical case than the correctness of the information is the diffusion speed on Twitter through the reply or retweet actions. These activities make the initial situation even more complicated than itself due to the unregulated nature of the social networks and the lack of an immediate verification mechanism for the correctness of the posts. When we consider the current Covid-19 pandemic period (causing the coronavirus disease), one of the most utilized information resources is Twitter except the official health administration institutions. Thereupon, examining the correctness of the information related to the Covid-19 pandemic by computational techniques (e.g., Data Mining, Machine Learning, and Deep Learning) has been gaining popularity and remains a substantial task. Hence, we mainly focused on analyzing the correctness of the posts related to the current pandemic shared on the Twitter platform. Therefore, the overall goal of this work is to classify the relevant tweets using linear and non-linear machine learning models. We achieved the best F1 performance score (99%) with the neural network model using the unigram features & threshold value of 50 among all model configurations.","PeriodicalId":106516,"journal":{"name":"2021 6th International Conference on Computer Science and Engineering (UBMK)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK52708.2021.9558945","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Due to the impressive information dissemination power of social networks such as Twitter, people tend to check social networks and Web pages more than other traditional news sources, including newspapers, TV news programs, or radio channels. In that sense, the information carried by the content of the shared social media posts becomes much more considerable. However, most of the posts are commonly either irrelevant or inaccurate. Besides, the more critical case than the correctness of the information is the diffusion speed on Twitter through the reply or retweet actions. These activities make the initial situation even more complicated than itself due to the unregulated nature of the social networks and the lack of an immediate verification mechanism for the correctness of the posts. When we consider the current Covid-19 pandemic period (causing the coronavirus disease), one of the most utilized information resources is Twitter except the official health administration institutions. Thereupon, examining the correctness of the information related to the Covid-19 pandemic by computational techniques (e.g., Data Mining, Machine Learning, and Deep Learning) has been gaining popularity and remains a substantial task. Hence, we mainly focused on analyzing the correctness of the posts related to the current pandemic shared on the Twitter platform. Therefore, the overall goal of this work is to classify the relevant tweets using linear and non-linear machine learning models. We achieved the best F1 performance score (99%) with the neural network model using the unigram features & threshold value of 50 among all model configurations.

查看原文本刊更多论文

Covid-19相关推文的比较分类

由于Twitter等社交网络令人印象深刻的信息传播能力，人们更倾向于查看社交网络和网页，而不是其他传统新闻来源，包括报纸、电视新闻节目或广播频道。从这个意义上说，分享的社交媒体帖子的内容所携带的信息变得更加可观。然而，大多数帖子通常要么无关紧要，要么不准确。此外，比信息的正确性更关键的情况是通过回复或转发动作在Twitter上的传播速度。由于社交网络不受监管的性质，以及缺乏对帖子正确性的即时核查机制，这些活动使最初的情况变得比本身更加复杂。考虑到目前的Covid-19大流行时期(引起冠状病毒病)，除了官方卫生行政机构外，利用最多的信息资源之一是Twitter。因此，通过计算技术(例如，数据挖掘、机器学习和深度学习)检查与Covid-19大流行相关的信息的正确性已经越来越受欢迎，并且仍然是一项实质性的任务。因此，我们主要分析Twitter平台上分享的与当前大流行相关的帖子的正确性。因此，本工作的总体目标是使用线性和非线性机器学习模型对相关推文进行分类。在所有模型配置中，我们使用一元特征和阈值为50的神经网络模型获得了最好的F1性能分数(99%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 6th International Conference on Computer Science and Engineering (UBMK)

自引率

0.00%

发文量