低资源场景下的跨语言跨领域危机分类

Cinthia Sánchez, Hernan Sarmiento, Andres Abeliuk, Jorge Pérez, Barbara Poblete
{"title":"低资源场景下的跨语言跨领域危机分类","authors":"Cinthia Sánchez, Hernan Sarmiento, Andres Abeliuk, Jorge Pérez, Barbara Poblete","doi":"10.1609/icwsm.v17i1.22185","DOIUrl":null,"url":null,"abstract":"Social media data has emerged as a useful source of timely information about real-world crisis events. One of the main tasks related to the use of social media for disaster management is the automatic identification of crisis-related messages. Most of the studies on this topic have focused on the analysis of data for a particular type of event in a specific language. This limits the possibility of generalizing existing approaches because models cannot be directly applied to new types of events or other languages. In this work, we study the task of automatically classifying messages that are related to crisis events by leveraging cross-language and cross-domain labeled data. Our goal is to make use of labeled data from high-resource languages to classify messages from other (low-resource) languages and/or of new (previously unseen) types of crisis situations. For our study we consolidated from the literature a large unified dataset containing multiple crisis events and languages. Our empirical findings show that it is indeed possible to leverage data from crisis events in English to classify the same type of event in other languages, such as Spanish and Italian (80.0% F1-score). Furthermore, we achieve good performance for the cross-domain task (80.0% F1-score) in a cross-lingual setting. Overall, our work contributes to improving the data scarcity problem that is so important for multilingual crisis classification. In particular, mitigating cold-start situations in emergency events, when time is of essence.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cross-Lingual and Cross-Domain Crisis Classification for Low-Resource Scenarios\",\"authors\":\"Cinthia Sánchez, Hernan Sarmiento, Andres Abeliuk, Jorge Pérez, Barbara Poblete\",\"doi\":\"10.1609/icwsm.v17i1.22185\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social media data has emerged as a useful source of timely information about real-world crisis events. One of the main tasks related to the use of social media for disaster management is the automatic identification of crisis-related messages. Most of the studies on this topic have focused on the analysis of data for a particular type of event in a specific language. This limits the possibility of generalizing existing approaches because models cannot be directly applied to new types of events or other languages. In this work, we study the task of automatically classifying messages that are related to crisis events by leveraging cross-language and cross-domain labeled data. Our goal is to make use of labeled data from high-resource languages to classify messages from other (low-resource) languages and/or of new (previously unseen) types of crisis situations. For our study we consolidated from the literature a large unified dataset containing multiple crisis events and languages. Our empirical findings show that it is indeed possible to leverage data from crisis events in English to classify the same type of event in other languages, such as Spanish and Italian (80.0% F1-score). Furthermore, we achieve good performance for the cross-domain task (80.0% F1-score) in a cross-lingual setting. Overall, our work contributes to improving the data scarcity problem that is so important for multilingual crisis classification. In particular, mitigating cold-start situations in emergency events, when time is of essence.\",\"PeriodicalId\":338112,\"journal\":{\"name\":\"Proceedings of the International AAAI Conference on Web and Social Media\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International AAAI Conference on Web and Social Media\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1609/icwsm.v17i1.22185\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International AAAI Conference on Web and Social Media","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/icwsm.v17i1.22185","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

社交媒体数据已成为有关现实世界危机事件的及时信息的有用来源。与使用社会媒体进行灾害管理有关的主要任务之一是自动识别与危机有关的信息。关于这一主题的大多数研究都集中在对特定语言中特定类型事件的数据进行分析。这限制了推广现有方法的可能性,因为模型不能直接应用于新类型的事件或其他语言。在这项工作中,我们研究了通过利用跨语言和跨领域标记数据对与危机事件相关的消息进行自动分类的任务。我们的目标是利用来自高资源语言的标记数据来对来自其他(低资源)语言和/或新的(以前未见过的)危机情况类型的消息进行分类。在我们的研究中,我们从文献中整合了一个包含多种危机事件和语言的大型统一数据集。我们的实证研究结果表明,确实有可能利用英语危机事件的数据来对其他语言(如西班牙语和意大利语)的同一类型事件进行分类(80.0%的f1得分)。此外,我们在跨语言设置的跨域任务中取得了良好的性能(80.0% f1得分)。总的来说,我们的工作有助于改善数据稀缺性问题,这对多语言危机分类非常重要。特别是,在时间紧迫的紧急情况下,减轻冷启动情况。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Cross-Lingual and Cross-Domain Crisis Classification for Low-Resource Scenarios
Social media data has emerged as a useful source of timely information about real-world crisis events. One of the main tasks related to the use of social media for disaster management is the automatic identification of crisis-related messages. Most of the studies on this topic have focused on the analysis of data for a particular type of event in a specific language. This limits the possibility of generalizing existing approaches because models cannot be directly applied to new types of events or other languages. In this work, we study the task of automatically classifying messages that are related to crisis events by leveraging cross-language and cross-domain labeled data. Our goal is to make use of labeled data from high-resource languages to classify messages from other (low-resource) languages and/or of new (previously unseen) types of crisis situations. For our study we consolidated from the literature a large unified dataset containing multiple crisis events and languages. Our empirical findings show that it is indeed possible to leverage data from crisis events in English to classify the same type of event in other languages, such as Spanish and Italian (80.0% F1-score). Furthermore, we achieve good performance for the cross-domain task (80.0% F1-score) in a cross-lingual setting. Overall, our work contributes to improving the data scarcity problem that is so important for multilingual crisis classification. In particular, mitigating cold-start situations in emergency events, when time is of essence.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信