Classifying Natural Disaster Tweet using a Convolutional Neural Network and BERT Embedding

Lucas Satria Aji Dharma, E. Winarko
{"title":"Classifying Natural Disaster Tweet using a Convolutional Neural Network and BERT Embedding","authors":"Lucas Satria Aji Dharma, E. Winarko","doi":"10.1109/ICITE54466.2022.9759860","DOIUrl":null,"url":null,"abstract":"Social media platforms have become a medium to find a vast source of information throughout the internet. Twitter has become one of the more popular microblogging platforms out there, and the more users there are in these platforms means the more various types of information can be sent out in a day. On Twitter users are able to write their expression in the form of tweets, this will then create a post on twitter's timeline and other users are able to see these tweets. If a tweet suddenly gets viral, Twitter will put the user's tweets into the trending page allowing even more users to view the said tweet. During an event of a natural disaster often a lot of the tweets that are being posted, have mention of the disaster making it a trending topic on Twitter. From this, a vast number of tweets about a disaster can be collected as data, but not always are the tweets containing information about the disaster. Often there are tweets that use natural disaster words but do not actually talk about the disaster itself, hence are not informative and can be classified as a non-disaster tweet. This research paper aims to propose a system to classify the disaster tweets and the non-disaster tweet during a disaster. The proposed method is based on Convolutional Neural Network (CNN), using a Bidirectional Encoder Representation from Transformers (BERT) as an Embedding. As a comparison, it will then be compared with another embedding method named Word2Vec. The Evaluation result after training and testing of the CNN with BERT embeddings gave the most consistent results attaining accuracy of 97.16% precision of 97.63%, a recall of 96.64, and an f1-score of 97.13% for the model classification.","PeriodicalId":123775,"journal":{"name":"2022 2nd International Conference on Information Technology and Education (ICIT&E)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Information Technology and Education (ICIT&E)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITE54466.2022.9759860","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Social media platforms have become a medium to find a vast source of information throughout the internet. Twitter has become one of the more popular microblogging platforms out there, and the more users there are in these platforms means the more various types of information can be sent out in a day. On Twitter users are able to write their expression in the form of tweets, this will then create a post on twitter's timeline and other users are able to see these tweets. If a tweet suddenly gets viral, Twitter will put the user's tweets into the trending page allowing even more users to view the said tweet. During an event of a natural disaster often a lot of the tweets that are being posted, have mention of the disaster making it a trending topic on Twitter. From this, a vast number of tweets about a disaster can be collected as data, but not always are the tweets containing information about the disaster. Often there are tweets that use natural disaster words but do not actually talk about the disaster itself, hence are not informative and can be classified as a non-disaster tweet. This research paper aims to propose a system to classify the disaster tweets and the non-disaster tweet during a disaster. The proposed method is based on Convolutional Neural Network (CNN), using a Bidirectional Encoder Representation from Transformers (BERT) as an Embedding. As a comparison, it will then be compared with another embedding method named Word2Vec. The Evaluation result after training and testing of the CNN with BERT embeddings gave the most consistent results attaining accuracy of 97.16% precision of 97.63%, a recall of 96.64, and an f1-score of 97.13% for the model classification.
使用卷积神经网络和BERT嵌入对自然灾害推文进行分类
社交媒体平台已经成为在互联网上寻找大量信息来源的媒介。Twitter已经成为最受欢迎的微博平台之一,这些平台上的用户越多,意味着一天内可以发出的各种信息就越多。在Twitter上,用户可以用tweet的形式写下他们的表达,这将在Twitter的时间轴上创建一个帖子,其他用户可以看到这些tweet。如果一条推文突然走红,Twitter会把用户的推文放到趋势页面,让更多的用户看到这条推文。在发生自然灾害时,经常会有很多推文提到这场灾难,使其成为推特上的热门话题。由此,可以收集大量关于灾难的推文作为数据,但并不总是包含灾难信息的推文。经常有推文使用自然灾害词汇,但实际上并没有谈论灾难本身,因此不具有信息性,可以归类为非灾难推文。本研究旨在提出一种灾害中灾害推文与非灾害推文的分类系统。该方法基于卷积神经网络(CNN),使用变形金刚的双向编码器表示(BERT)作为嵌入。作为比较,它将与另一种名为Word2Vec的嵌入方法进行比较。对BERT嵌入的CNN进行训练和测试后的评价结果最为一致,准确率为97.16%,精密度为97.63%,召回率为96.64,模型分类的f1得分为97.13%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信