{"title":"Disaster Tweet Classification Based On Geospatial Data Using the BERT-MLP Method","authors":"Iqbal Maulana, W. Maharani","doi":"10.1109/ICoICT52021.2021.9527513","DOIUrl":null,"url":null,"abstract":"as a popular social media in the world and even in Indonesia, Twitter has a variety of popular topics making these topics trending, including the topic of natural disasters that have occurred in Indonesia. The DKI Jakarta flood disaster in early 2020 made a big scene on trending twitter topics. This study aims to classify these tweets into \"flooded\" and \"not flooded\" predictions with the tweets and geospatial features. The model proposed for classifying is BERT-MLP. Bidirectional Encoder from Transformers (BERT) is used in the pre-trained model to classify these tweets and Multi Layer Perceptron (MLP) is used to classify geospatial features. The scenario designed for the model focuses on the preprocessing of tweets as follows without stopword removal, without stemming, with both, and without both. Once classified, the tweet will be visualized into a two-dimensional interactive map. The best scenario results have an accuracy of 82% in scenarios without stemming and with stopword removal. This is due to the stemming process eliminates some of the features in tweets around 6%. This study also shows the relationship between the influence of negative context tweets on the \"not flooded\" class with an orientation of 65% of the total data. However, defining manual stopwords can affect because stopword removal will not delete words that still have context related features to the topic.","PeriodicalId":191671,"journal":{"name":"2021 9th International Conference on Information and Communication Technology (ICoICT)","volume":"448 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 9th International Conference on Information and Communication Technology (ICoICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICoICT52021.2021.9527513","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
as a popular social media in the world and even in Indonesia, Twitter has a variety of popular topics making these topics trending, including the topic of natural disasters that have occurred in Indonesia. The DKI Jakarta flood disaster in early 2020 made a big scene on trending twitter topics. This study aims to classify these tweets into "flooded" and "not flooded" predictions with the tweets and geospatial features. The model proposed for classifying is BERT-MLP. Bidirectional Encoder from Transformers (BERT) is used in the pre-trained model to classify these tweets and Multi Layer Perceptron (MLP) is used to classify geospatial features. The scenario designed for the model focuses on the preprocessing of tweets as follows without stopword removal, without stemming, with both, and without both. Once classified, the tweet will be visualized into a two-dimensional interactive map. The best scenario results have an accuracy of 82% in scenarios without stemming and with stopword removal. This is due to the stemming process eliminates some of the features in tweets around 6%. This study also shows the relationship between the influence of negative context tweets on the "not flooded" class with an orientation of 65% of the total data. However, defining manual stopwords can affect because stopword removal will not delete words that still have context related features to the topic.