Disaster Tweet Classification Based On Geospatial Data Using the BERT-MLP Method

Iqbal Maulana, W. Maharani
{"title":"Disaster Tweet Classification Based On Geospatial Data Using the BERT-MLP Method","authors":"Iqbal Maulana, W. Maharani","doi":"10.1109/ICoICT52021.2021.9527513","DOIUrl":null,"url":null,"abstract":"as a popular social media in the world and even in Indonesia, Twitter has a variety of popular topics making these topics trending, including the topic of natural disasters that have occurred in Indonesia. The DKI Jakarta flood disaster in early 2020 made a big scene on trending twitter topics. This study aims to classify these tweets into \"flooded\" and \"not flooded\" predictions with the tweets and geospatial features. The model proposed for classifying is BERT-MLP. Bidirectional Encoder from Transformers (BERT) is used in the pre-trained model to classify these tweets and Multi Layer Perceptron (MLP) is used to classify geospatial features. The scenario designed for the model focuses on the preprocessing of tweets as follows without stopword removal, without stemming, with both, and without both. Once classified, the tweet will be visualized into a two-dimensional interactive map. The best scenario results have an accuracy of 82% in scenarios without stemming and with stopword removal. This is due to the stemming process eliminates some of the features in tweets around 6%. This study also shows the relationship between the influence of negative context tweets on the \"not flooded\" class with an orientation of 65% of the total data. However, defining manual stopwords can affect because stopword removal will not delete words that still have context related features to the topic.","PeriodicalId":191671,"journal":{"name":"2021 9th International Conference on Information and Communication Technology (ICoICT)","volume":"448 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 9th International Conference on Information and Communication Technology (ICoICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICoICT52021.2021.9527513","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

as a popular social media in the world and even in Indonesia, Twitter has a variety of popular topics making these topics trending, including the topic of natural disasters that have occurred in Indonesia. The DKI Jakarta flood disaster in early 2020 made a big scene on trending twitter topics. This study aims to classify these tweets into "flooded" and "not flooded" predictions with the tweets and geospatial features. The model proposed for classifying is BERT-MLP. Bidirectional Encoder from Transformers (BERT) is used in the pre-trained model to classify these tweets and Multi Layer Perceptron (MLP) is used to classify geospatial features. The scenario designed for the model focuses on the preprocessing of tweets as follows without stopword removal, without stemming, with both, and without both. Once classified, the tweet will be visualized into a two-dimensional interactive map. The best scenario results have an accuracy of 82% in scenarios without stemming and with stopword removal. This is due to the stemming process eliminates some of the features in tweets around 6%. This study also shows the relationship between the influence of negative context tweets on the "not flooded" class with an orientation of 65% of the total data. However, defining manual stopwords can affect because stopword removal will not delete words that still have context related features to the topic.
基于地理空间数据的BERT-MLP方法灾害推文分类
Twitter作为全球乃至印尼的热门社交媒体,拥有各种热门话题,使这些话题成为热门话题,其中就包括印尼发生的自然灾害话题。2020年初雅加达洪水灾害在推特热门话题上引起了轰动。本研究旨在根据推文和地理空间特征将这些推文分类为“淹没”和“未淹没”预测。提出的分类模型是BERT-MLP。在预训练模型中使用双向编码器(BERT)对推文进行分类,使用多层感知器(MLP)对地理空间特征进行分类。为该模型设计的场景侧重于tweet的预处理,如下所示:不删除停止词、不词干提取、两者都使用、两者都不使用。一旦分类,推文将被可视化成一个二维互动地图。在没有词干和去除停词的情况下,最佳场景结果的准确率为82%。这是由于提取过程消除了推文中大约6%的一些特征。本研究还显示了负面上下文推文对“未淹没”类的影响之间的关系,其方向占总数据的65%。然而,定义手动停止词可能会产生影响,因为停止词删除不会删除仍然具有与主题相关的上下文特征的词。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信