DisTGranD: Granular event/sub-event classification for disaster response

Q1 Social Sciences

Online Social Networks and Media Pub Date : 2025-01-01 DOI:10.1016/j.osnem.2024.100297

Ademola Adesokan , Sanjay Madria , Long Nguyen

{"title":"DisTGranD: Granular event/sub-event classification for disaster response","authors":"Ademola Adesokan , Sanjay Madria , Long Nguyen","doi":"10.1016/j.osnem.2024.100297","DOIUrl":null,"url":null,"abstract":"<div><div>Efficient crisis management relies on prompt and precise analysis of disaster data from various sources, including social media. The advantage of fine-grained, annotated, class-labeled data is the provision of a diversified range of information compared to high-level label datasets. In this study, we introduce a dataset richly annotated at a low level to more accurately classify crisis-related communication. To this end, we first present DisTGranD, an extensively annotated dataset of over 47,600 tweets related to earthquakes and hurricanes. The dataset uses the Automatic Content Extraction (ACE) standard to provide detailed classification into dual-layer annotation for events and sub-events and identify critical triggers and supporting arguments. The inter-annotator evaluation of DisTGranD demonstrated high agreement among annotators, with Fleiss Kappa scores of 0.90 and 0.93 for event and sub-event types, respectively. Moreover, a transformer-based embedded phrase extraction method showed XLNet achieving an impressive 96% intra-label similarity score for event type and 97% for sub-event type. We further proposed a novel deep learning classification model, RoBiCCus, which achieved <span><math><mrow><mo>≥</mo><mn>90</mn><mtext>%</mtext></mrow></math></span> accuracy and F1-Score in the event and sub-event type classification tasks on our DisTGranD dataset and outperformed other models on publicly available disaster datasets. DisTGranD dataset represents a nuanced class-labeled framework for detecting and classifying disaster-related social media content, which can significantly aid decision-making in disaster response. This robust dataset enables deep-learning models to provide insightful, actionable data during crises. Our annotated dataset and code are publicly available on GitHub <span><span><sup>1</sup></span></span>.</div></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"45 ","pages":"Article 100297"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Online Social Networks and Media","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468696424000223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 0

Abstract

Efficient crisis management relies on prompt and precise analysis of disaster data from various sources, including social media. The advantage of fine-grained, annotated, class-labeled data is the provision of a diversified range of information compared to high-level label datasets. In this study, we introduce a dataset richly annotated at a low level to more accurately classify crisis-related communication. To this end, we first present DisTGranD, an extensively annotated dataset of over 47,600 tweets related to earthquakes and hurricanes. The dataset uses the Automatic Content Extraction (ACE) standard to provide detailed classification into dual-layer annotation for events and sub-events and identify critical triggers and supporting arguments. The inter-annotator evaluation of DisTGranD demonstrated high agreement among annotators, with Fleiss Kappa scores of 0.90 and 0.93 for event and sub-event types, respectively. Moreover, a transformer-based embedded phrase extraction method showed XLNet achieving an impressive 96% intra-label similarity score for event type and 97% for sub-event type. We further proposed a novel deep learning classification model, RoBiCCus, which achieved

\geq 90 %

accuracy and F1-Score in the event and sub-event type classification tasks on our DisTGranD dataset and outperformed other models on publicly available disaster datasets. DisTGranD dataset represents a nuanced class-labeled framework for detecting and classifying disaster-related social media content, which can significantly aid decision-making in disaster response. This robust dataset enables deep-learning models to provide insightful, actionable data during crises. Our annotated dataset and code are publicly available on GitHub ¹.

查看原文本刊更多论文

DisTGranD：灾难响应的细粒度事件/子事件分类

有效的危机管理依赖于对包括社交媒体在内的各种来源的灾难数据进行及时、准确的分析。与高级标签数据集相比，细粒度、带注释、类标记的数据的优点是提供了多样化的信息范围。在本研究中，我们引入了一个低层次的丰富注释数据集，以更准确地对危机相关的通信进行分类。为此，我们首先展示了DisTGranD，这是一个广泛注释的数据集，包含超过47600条与地震和飓风相关的推文。该数据集使用自动内容提取（Automatic Content Extraction， ACE）标准，为事件和子事件提供双层注释的详细分类，并识别关键触发器和支持参数。disgrand的注释者间评价显示注释者之间的一致性很高，事件和子事件类型的Fleiss Kappa评分分别为0.90和0.93。此外，基于转换器的嵌入式短语提取方法表明，XLNet在事件类型和子事件类型上的标签内相似性得分分别达到了令人印象深刻的96%和97%。我们进一步提出了一种新的深度学习分类模型RoBiCCus，该模型在DisTGranD数据集上的事件和子事件类型分类任务中达到了≥90%的准确率和F1-Score，并且在公开可用的灾难数据集上优于其他模型。DisTGranD数据集代表了一个细微的类别标记框架，用于检测和分类与灾害相关的社交媒体内容，这可以显著地帮助灾害响应中的决策。这个强大的数据集使深度学习模型能够在危机期间提供有洞察力的、可操作的数据。我们的注释数据集和代码在GitHub 1上公开可用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊