Classification of Fire Related Tweets on Twitter Using Bidirectional Encoder Representations from Transformers (BERT)

2021 IEEE 13th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM) Pub Date : 2021-11-28 DOI:10.1109/HNICEM54116.2021.9731956

Jairus Mingua, Dionis A. Padilla, Evan Joy Celino

{"title":"Classification of Fire Related Tweets on Twitter Using Bidirectional Encoder Representations from Transformers (BERT)","authors":"Jairus Mingua, Dionis A. Padilla, Evan Joy Celino","doi":"10.1109/HNICEM54116.2021.9731956","DOIUrl":null,"url":null,"abstract":"Bidirectional Encoder Representation from Transformers (BERT) is a transfer learning model approach in natural language processing (NLP). BERT has different types of pre-trained models that can pre-train a language representation to create a model that can be finetuned on specific tasks using a dataset like text classification to produce state of the art predictions. Recent studies providing the use of BERT in natural language processing have highlighted that there are no publicly available Filipino tweet datasets regarding fire reports on social media that lead to a lack of classification models. This paper aims to design and implement a system to classify Filipino tweets using different pre-trained BERT models. Upon creating a model exclusive for organizing Filipino tweets using 2,081 tweets as a dataset that contains fire-related tweets, the researchers were able to compare the accuracy of the different finetuned pre-trained BERT models. The data shows a significant difference in the accuracy of each pre-trained BERT model. The highest of which is the BERT Base Uncased WWM model with a test accuracy of 87.50% and a train loss of 0.06 during training of the dataset. The least accurate among the pre-trained BERT models is the BERT Base Cased WWM model, with a test accuracy of 76.34% and a train loss of 0.2. The result shows that BERT Base Uncased WWM model can be a reliable model in classifying fire-related tweets. The accuracy obtained by the models may vary depending on how large the dataset is.","PeriodicalId":129868,"journal":{"name":"2021 IEEE 13th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 13th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HNICEM54116.2021.9731956","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Bidirectional Encoder Representation from Transformers (BERT) is a transfer learning model approach in natural language processing (NLP). BERT has different types of pre-trained models that can pre-train a language representation to create a model that can be finetuned on specific tasks using a dataset like text classification to produce state of the art predictions. Recent studies providing the use of BERT in natural language processing have highlighted that there are no publicly available Filipino tweet datasets regarding fire reports on social media that lead to a lack of classification models. This paper aims to design and implement a system to classify Filipino tweets using different pre-trained BERT models. Upon creating a model exclusive for organizing Filipino tweets using 2,081 tweets as a dataset that contains fire-related tweets, the researchers were able to compare the accuracy of the different finetuned pre-trained BERT models. The data shows a significant difference in the accuracy of each pre-trained BERT model. The highest of which is the BERT Base Uncased WWM model with a test accuracy of 87.50% and a train loss of 0.06 during training of the dataset. The least accurate among the pre-trained BERT models is the BERT Base Cased WWM model, with a test accuracy of 76.34% and a train loss of 0.2. The result shows that BERT Base Uncased WWM model can be a reliable model in classifying fire-related tweets. The accuracy obtained by the models may vary depending on how large the dataset is.

查看原文本刊更多论文

利用双向编码器表示(BERT)对Twitter上火灾相关推文进行分类

BERT是自然语言处理(NLP)中的一种迁移学习模型方法。BERT有不同类型的预训练模型，可以预训练语言表示来创建一个模型，该模型可以使用文本分类等数据集对特定任务进行微调，从而产生最先进的预测。最近在自然语言处理中使用BERT的研究强调，没有公开可用的关于社交媒体上的火灾报告的菲律宾推文数据集，导致缺乏分类模型。本文旨在设计并实现一个使用不同预训练的BERT模型对菲律宾推文进行分类的系统。在使用2081条推文作为包含火灾相关推文的数据集创建了一个专门用于组织菲律宾推文的模型后，研究人员能够比较不同的微调预训练BERT模型的准确性。数据显示，每个预训练的BERT模型的准确性有显著差异。其中最高的是BERT Base uncase wm模型，在数据集训练过程中测试精度为87.50%，训练损失为0.06。在预训练的BERT模型中，准确率最低的是BERT Base case WWM模型，其测试准确率为76.34%，训练损失为0.2。结果表明，基于BERT的uncase wm模型是一种可靠的火灾相关推文分类模型。模型获得的精度可能因数据集的大小而异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 13th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM)

自引率

0.00%

发文量