Classifying Natural Disaster Tweet using a Convolutional Neural Network and BERT Embedding

2022 2nd International Conference on Information Technology and Education (ICIT&E) Pub Date : 2022-01-22 DOI:10.1109/ICITE54466.2022.9759860

Lucas Satria Aji Dharma, E. Winarko

{"title":"Classifying Natural Disaster Tweet using a Convolutional Neural Network and BERT Embedding","authors":"Lucas Satria Aji Dharma, E. Winarko","doi":"10.1109/ICITE54466.2022.9759860","DOIUrl":null,"url":null,"abstract":"Social media platforms have become a medium to find a vast source of information throughout the internet. Twitter has become one of the more popular microblogging platforms out there, and the more users there are in these platforms means the more various types of information can be sent out in a day. On Twitter users are able to write their expression in the form of tweets, this will then create a post on twitter's timeline and other users are able to see these tweets. If a tweet suddenly gets viral, Twitter will put the user's tweets into the trending page allowing even more users to view the said tweet. During an event of a natural disaster often a lot of the tweets that are being posted, have mention of the disaster making it a trending topic on Twitter. From this, a vast number of tweets about a disaster can be collected as data, but not always are the tweets containing information about the disaster. Often there are tweets that use natural disaster words but do not actually talk about the disaster itself, hence are not informative and can be classified as a non-disaster tweet. This research paper aims to propose a system to classify the disaster tweets and the non-disaster tweet during a disaster. The proposed method is based on Convolutional Neural Network (CNN), using a Bidirectional Encoder Representation from Transformers (BERT) as an Embedding. As a comparison, it will then be compared with another embedding method named Word2Vec. The Evaluation result after training and testing of the CNN with BERT embeddings gave the most consistent results attaining accuracy of 97.16% precision of 97.63%, a recall of 96.64, and an f1-score of 97.13% for the model classification.","PeriodicalId":123775,"journal":{"name":"2022 2nd International Conference on Information Technology and Education (ICIT&E)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Information Technology and Education (ICIT&E)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITE54466.2022.9759860","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Social media platforms have become a medium to find a vast source of information throughout the internet. Twitter has become one of the more popular microblogging platforms out there, and the more users there are in these platforms means the more various types of information can be sent out in a day. On Twitter users are able to write their expression in the form of tweets, this will then create a post on twitter's timeline and other users are able to see these tweets. If a tweet suddenly gets viral, Twitter will put the user's tweets into the trending page allowing even more users to view the said tweet. During an event of a natural disaster often a lot of the tweets that are being posted, have mention of the disaster making it a trending topic on Twitter. From this, a vast number of tweets about a disaster can be collected as data, but not always are the tweets containing information about the disaster. Often there are tweets that use natural disaster words but do not actually talk about the disaster itself, hence are not informative and can be classified as a non-disaster tweet. This research paper aims to propose a system to classify the disaster tweets and the non-disaster tweet during a disaster. The proposed method is based on Convolutional Neural Network (CNN), using a Bidirectional Encoder Representation from Transformers (BERT) as an Embedding. As a comparison, it will then be compared with another embedding method named Word2Vec. The Evaluation result after training and testing of the CNN with BERT embeddings gave the most consistent results attaining accuracy of 97.16% precision of 97.63%, a recall of 96.64, and an f1-score of 97.13% for the model classification.

查看原文本刊更多论文

使用卷积神经网络和BERT嵌入对自然灾害推文进行分类

社交媒体平台已经成为在互联网上寻找大量信息来源的媒介。Twitter已经成为最受欢迎的微博平台之一，这些平台上的用户越多，意味着一天内可以发出的各种信息就越多。在Twitter上，用户可以用tweet的形式写下他们的表达，这将在Twitter的时间轴上创建一个帖子，其他用户可以看到这些tweet。如果一条推文突然走红，Twitter会把用户的推文放到趋势页面，让更多的用户看到这条推文。在发生自然灾害时，经常会有很多推文提到这场灾难，使其成为推特上的热门话题。由此，可以收集大量关于灾难的推文作为数据，但并不总是包含灾难信息的推文。经常有推文使用自然灾害词汇，但实际上并没有谈论灾难本身，因此不具有信息性，可以归类为非灾难推文。本研究旨在提出一种灾害中灾害推文与非灾害推文的分类系统。该方法基于卷积神经网络(CNN)，使用变形金刚的双向编码器表示(BERT)作为嵌入。作为比较，它将与另一种名为Word2Vec的嵌入方法进行比较。对BERT嵌入的CNN进行训练和测试后的评价结果最为一致，准确率为97.16%，精密度为97.63%，召回率为96.64，模型分类的f1得分为97.13%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 2nd International Conference on Information Technology and Education (ICIT&E)

自引率

0.00%

发文量