Estimating Distributed Representation Performance in Disaster-Related Social Media Classification

2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) Pub Date : 2019-08-01 DOI:10.1145/3341161.3343680

P. Jain, R. Ross, Bianca Schoen-Phelan

{"title":"Estimating Distributed Representation Performance in Disaster-Related Social Media Classification","authors":"P. Jain, R. Ross, Bianca Schoen-Phelan","doi":"10.1145/3341161.3343680","DOIUrl":null,"url":null,"abstract":"This paper examines the effectiveness of a range of pre-trained language representations in order to determine the informativeness and information type of social media in the event of natural or man-made disasters. Within the context of disaster tweet analysis, we aim to accurately analyse tweets while minimising both false positive and false negatives in the automated information analysis. The investigation is performed across a number of well known disaster-related twitter datasets. Models that are built from pre-trained word embeddings from Word2Vec, GloVe, ELMo and BERT are used for performance evaluation. Given the relative ubiquity of BERT as a standout language representation in recent times it was expected that BERT dominates results. However, results are more diverse, with classical Word2Vec and GloVe both displaying strong results. As part of the analysis, we discuss some challenges related to automated twitter analysis including the fine-tuning of language models to disaster-related scenarios.","PeriodicalId":403360,"journal":{"name":"2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"261 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3341161.3343680","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

This paper examines the effectiveness of a range of pre-trained language representations in order to determine the informativeness and information type of social media in the event of natural or man-made disasters. Within the context of disaster tweet analysis, we aim to accurately analyse tweets while minimising both false positive and false negatives in the automated information analysis. The investigation is performed across a number of well known disaster-related twitter datasets. Models that are built from pre-trained word embeddings from Word2Vec, GloVe, ELMo and BERT are used for performance evaluation. Given the relative ubiquity of BERT as a standout language representation in recent times it was expected that BERT dominates results. However, results are more diverse, with classical Word2Vec and GloVe both displaying strong results. As part of the analysis, we discuss some challenges related to automated twitter analysis including the fine-tuning of language models to disaster-related scenarios.

查看原文本刊更多论文

灾害相关社交媒体分类中分布式表示性能的估计

本文考察了一系列预先训练的语言表征的有效性，以确定社交媒体在发生自然或人为灾害时的信息量和信息类型。在灾难推文分析的背景下，我们的目标是准确地分析推文，同时最大限度地减少自动化信息分析中的假阳性和假阴性。这项调查是在许多众所周知的与灾难有关的twitter数据集上进行的。从Word2Vec、GloVe、ELMo和BERT的预训练词嵌入中构建的模型用于性能评估。鉴于BERT作为一种突出的语言表示在最近的时间里相对普遍存在，人们预计BERT会主导结果。然而，结果更加多样化，经典的Word2Vec和GloVe都显示出很强的结果。作为分析的一部分，我们将讨论与自动twitter分析相关的一些挑战，包括针对与灾难相关的场景对语言模型进行微调。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)

自引率

0.00%

发文量