用于灾害相关社交媒体图像分类的多模态迁移学习框架

Journal of Intelligent & Fuzzy Systems Pub Date : 2024-05-07 DOI:10.3233/jifs-241271

Saima Saleem, Anuradha Khattar, Monica Mehrotra

{"title":"用于灾害相关社交媒体图像分类的多模态迁移学习框架","authors":"Saima Saleem, Anuradha Khattar, Monica Mehrotra","doi":"10.3233/jifs-241271","DOIUrl":null,"url":null,"abstract":"Rapidly classifying disaster-related social media (SM) images during a catastrophe event is critical for enhancing disaster response efforts. However, the biggest challenge lies in acquiring labeled data for an ongoing (target) disaster to train supervised learning-based models, given that the labeling process is both time-consuming and costly. In this study, we address this challenge by proposing a new multimodal transfer learning framework for the real-time classification of SM images of the target disaster. The proposed framework is based on Contrastive Language-Image Pretraining (CLIP) model, jointly pretrained on a dataset of image-text pairs via contrastive learning. We propose two distinct methods to design our classification framework (1) Zero-Shot CLIP: it learns visual representations from images paired with natural language descriptions of classes. By utilizing the vision and language capabilities of CLIP, we extract meaningful features from unlabeled target disaster images and map them to semantically related textual class descriptions, enabling image classification without training on disaster-specific data. (2) Linear-Probe CLIP: it further enhances the performance and involves training a linear classifier on top of the pretrained CLIP model’s features, specifically tailored to the disaster image classification task. By optimizing the linear-probe classifier, we improve the model’s ability to discriminate between different classes and achieve higher performance without the need for labeled data of the target disaster. Both methods are evaluated on a benchmark X (formerly Twitter) dataset comprising images of seven real-world disaster events. The experimental outcomes showcase the efficacy of the proposed methods, with Linear-Probe CLIP achieving a remarkable 7% improvement in average F1-score relative to the state-of-the-art methods.","PeriodicalId":194936,"journal":{"name":"Journal of Intelligent & Fuzzy Systems","volume":"10 39","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A multimodal transfer learning framework for the classification of disaster-related social media images\",\"authors\":\"Saima Saleem, Anuradha Khattar, Monica Mehrotra\",\"doi\":\"10.3233/jifs-241271\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Rapidly classifying disaster-related social media (SM) images during a catastrophe event is critical for enhancing disaster response efforts. However, the biggest challenge lies in acquiring labeled data for an ongoing (target) disaster to train supervised learning-based models, given that the labeling process is both time-consuming and costly. In this study, we address this challenge by proposing a new multimodal transfer learning framework for the real-time classification of SM images of the target disaster. The proposed framework is based on Contrastive Language-Image Pretraining (CLIP) model, jointly pretrained on a dataset of image-text pairs via contrastive learning. We propose two distinct methods to design our classification framework (1) Zero-Shot CLIP: it learns visual representations from images paired with natural language descriptions of classes. By utilizing the vision and language capabilities of CLIP, we extract meaningful features from unlabeled target disaster images and map them to semantically related textual class descriptions, enabling image classification without training on disaster-specific data. (2) Linear-Probe CLIP: it further enhances the performance and involves training a linear classifier on top of the pretrained CLIP model’s features, specifically tailored to the disaster image classification task. By optimizing the linear-probe classifier, we improve the model’s ability to discriminate between different classes and achieve higher performance without the need for labeled data of the target disaster. Both methods are evaluated on a benchmark X (formerly Twitter) dataset comprising images of seven real-world disaster events. The experimental outcomes showcase the efficacy of the proposed methods, with Linear-Probe CLIP achieving a remarkable 7% improvement in average F1-score relative to the state-of-the-art methods.\",\"PeriodicalId\":194936,\"journal\":{\"name\":\"Journal of Intelligent & Fuzzy Systems\",\"volume\":\"10 39\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Intelligent & Fuzzy Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/jifs-241271\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Intelligent & Fuzzy Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/jifs-241271","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在灾难事件中对与灾难相关的社交媒体（SM）图像进行快速分类对于加强灾难应对工作至关重要。然而，最大的挑战在于如何获取正在发生的（目标）灾难的标注数据来训练基于监督学习的模型，因为标注过程既耗时又昂贵。在本研究中，我们针对这一挑战提出了一种新的多模态迁移学习框架，用于对目标灾难的 SM 图像进行实时分类。该框架基于对比语言-图像预训练（CLIP）模型，通过对比学习在图像-文本对数据集上进行联合预训练。我们提出了两种不同的方法来设计我们的分类框架 (1) Zero-Shot CLIP：它从图像中学习视觉表征，并配以对类别的自然语言描述。通过利用 CLIP 的视觉和语言能力，我们从未标明的目标灾难图像中提取有意义的特征，并将其映射到语义相关的文本类别描述中，从而无需对特定灾难数据进行训练即可实现图像分类。(2) 线性探测 CLIP：它进一步提高了性能，包括在预训练 CLIP 模型特征的基础上训练一个线性分类器，专门用于灾难图像分类任务。通过优化线性探针分类器，我们提高了模型区分不同类别的能力，并在无需目标灾难标记数据的情况下实现了更高的性能。这两种方法都在基准 X（原 Twitter）数据集上进行了评估，该数据集由七个真实世界灾难事件的图像组成。实验结果展示了所提方法的功效，与最先进的方法相比，Linear-Probe CLIP 的平均 F1 分数显著提高了 7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A multimodal transfer learning framework for the classification of disaster-related social media images

Rapidly classifying disaster-related social media (SM) images during a catastrophe event is critical for enhancing disaster response efforts. However, the biggest challenge lies in acquiring labeled data for an ongoing (target) disaster to train supervised learning-based models, given that the labeling process is both time-consuming and costly. In this study, we address this challenge by proposing a new multimodal transfer learning framework for the real-time classification of SM images of the target disaster. The proposed framework is based on Contrastive Language-Image Pretraining (CLIP) model, jointly pretrained on a dataset of image-text pairs via contrastive learning. We propose two distinct methods to design our classification framework (1) Zero-Shot CLIP: it learns visual representations from images paired with natural language descriptions of classes. By utilizing the vision and language capabilities of CLIP, we extract meaningful features from unlabeled target disaster images and map them to semantically related textual class descriptions, enabling image classification without training on disaster-specific data. (2) Linear-Probe CLIP: it further enhances the performance and involves training a linear classifier on top of the pretrained CLIP model’s features, specifically tailored to the disaster image classification task. By optimizing the linear-probe classifier, we improve the model’s ability to discriminate between different classes and achieve higher performance without the need for labeled data of the target disaster. Both methods are evaluated on a benchmark X (formerly Twitter) dataset comprising images of seven real-world disaster events. The experimental outcomes showcase the efficacy of the proposed methods, with Linear-Probe CLIP achieving a remarkable 7% improvement in average F1-score relative to the state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Intelligent & Fuzzy Systems

自引率

0.00%

发文量