Estimating Distributed Representation Performance in Disaster-Related Social Media Classification

P. Jain, R. Ross, Bianca Schoen-Phelan
{"title":"Estimating Distributed Representation Performance in Disaster-Related Social Media Classification","authors":"P. Jain, R. Ross, Bianca Schoen-Phelan","doi":"10.1145/3341161.3343680","DOIUrl":null,"url":null,"abstract":"This paper examines the effectiveness of a range of pre-trained language representations in order to determine the informativeness and information type of social media in the event of natural or man-made disasters. Within the context of disaster tweet analysis, we aim to accurately analyse tweets while minimising both false positive and false negatives in the automated information analysis. The investigation is performed across a number of well known disaster-related twitter datasets. Models that are built from pre-trained word embeddings from Word2Vec, GloVe, ELMo and BERT are used for performance evaluation. Given the relative ubiquity of BERT as a standout language representation in recent times it was expected that BERT dominates results. However, results are more diverse, with classical Word2Vec and GloVe both displaying strong results. As part of the analysis, we discuss some challenges related to automated twitter analysis including the fine-tuning of language models to disaster-related scenarios.","PeriodicalId":403360,"journal":{"name":"2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"261 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3341161.3343680","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

This paper examines the effectiveness of a range of pre-trained language representations in order to determine the informativeness and information type of social media in the event of natural or man-made disasters. Within the context of disaster tweet analysis, we aim to accurately analyse tweets while minimising both false positive and false negatives in the automated information analysis. The investigation is performed across a number of well known disaster-related twitter datasets. Models that are built from pre-trained word embeddings from Word2Vec, GloVe, ELMo and BERT are used for performance evaluation. Given the relative ubiquity of BERT as a standout language representation in recent times it was expected that BERT dominates results. However, results are more diverse, with classical Word2Vec and GloVe both displaying strong results. As part of the analysis, we discuss some challenges related to automated twitter analysis including the fine-tuning of language models to disaster-related scenarios.
灾害相关社交媒体分类中分布式表示性能的估计
本文考察了一系列预先训练的语言表征的有效性,以确定社交媒体在发生自然或人为灾害时的信息量和信息类型。在灾难推文分析的背景下,我们的目标是准确地分析推文,同时最大限度地减少自动化信息分析中的假阳性和假阴性。这项调查是在许多众所周知的与灾难有关的twitter数据集上进行的。从Word2Vec、GloVe、ELMo和BERT的预训练词嵌入中构建的模型用于性能评估。鉴于BERT作为一种突出的语言表示在最近的时间里相对普遍存在,人们预计BERT会主导结果。然而,结果更加多样化,经典的Word2Vec和GloVe都显示出很强的结果。作为分析的一部分,我们将讨论与自动twitter分析相关的一些挑战,包括针对与灾难相关的场景对语言模型进行微调。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信