SLIP-flood: Soft-combination of Swin Transformer and Lightweight Language-Image Pre-training for Flood Images Classification

IF 7.6 Q1 REMOTE SENSING

International journal of applied earth observation and geoinformation : ITC journal Pub Date : 2025-04-25 DOI:10.1016/j.jag.2025.104543

Heng Tang , Xiaoping Rui , Jiarui Li , Ninglei Ouyang , Yiheng Xie , Xiaodie Liu , Yiming Bi

{"title":"SLIP-flood: Soft-combination of Swin Transformer and Lightweight Language-Image Pre-training for Flood Images Classification","authors":"Heng Tang , Xiaoping Rui , Jiarui Li , Ninglei Ouyang , Yiheng Xie , Xiaodie Liu , Yiming Bi","doi":"10.1016/j.jag.2025.104543","DOIUrl":null,"url":null,"abstract":"<div><div>Flood monitoring is a complex task involving multimodal data mining and multitask collaboration. In order to leverage the role of multimodal data in flood management, conducting visual-language pretraining (VLP) in the field of flood disaster monitoring and obtaining foundational pretraining models that are suitable for multiple downstream flood-related tasks is an urgent problem that needs to be addressed. This paper introduces SLIP-Flood, an innovative VLP framework supporting flood image classification, image-text retrieval, and auxiliary text classification. To overcome the limitations of existing cross-modal models that rely on small datasets and lack robustness, we have constructed two specialized datasets for the first time: 1) FloodMulS for the Flood Image Classification Model (FICM), and 2) FloodIT for the Flood Text-Image Retrieval Model (FTIRM). Traditional models employ “Hard Categorization Strategy (HC)” for image classification, neglecting the impacts of “Categorization Ambiguity.” To improve performance, we propose a “Soft Categorization Strategy.” Furthermore, traditional models focus on unimodal (image) information, not fully utilizing joint image-text information. We address this with a “Soft Combination” to integrate FICM and FTIRM, termed SCSC. Experimental results show SCSC improves SLIP-Flood’s performance: a 7.62% increase in the F1 score on FICM compared to HC, and a 0.35% increase in FTIRM’s F1 score based on FICM. SLIP-Flood also achieves a maximum recall of 89.24% in image-text retrieval and shows promise in auxiliary flood text classification. Relevant resources are available at https://github.com/muhan-yy/SLIP-Flood.git.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"139 ","pages":"Article 104543"},"PeriodicalIF":7.6000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of applied earth observation and geoinformation : ITC journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569843225001906","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REMOTE SENSING","Score":null,"Total":0}

引用次数: 0

Abstract

Flood monitoring is a complex task involving multimodal data mining and multitask collaboration. In order to leverage the role of multimodal data in flood management, conducting visual-language pretraining (VLP) in the field of flood disaster monitoring and obtaining foundational pretraining models that are suitable for multiple downstream flood-related tasks is an urgent problem that needs to be addressed. This paper introduces SLIP-Flood, an innovative VLP framework supporting flood image classification, image-text retrieval, and auxiliary text classification. To overcome the limitations of existing cross-modal models that rely on small datasets and lack robustness, we have constructed two specialized datasets for the first time: 1) FloodMulS for the Flood Image Classification Model (FICM), and 2) FloodIT for the Flood Text-Image Retrieval Model (FTIRM). Traditional models employ “Hard Categorization Strategy (HC)” for image classification, neglecting the impacts of “Categorization Ambiguity.” To improve performance, we propose a “Soft Categorization Strategy.” Furthermore, traditional models focus on unimodal (image) information, not fully utilizing joint image-text information. We address this with a “Soft Combination” to integrate FICM and FTIRM, termed SCSC. Experimental results show SCSC improves SLIP-Flood’s performance: a 7.62% increase in the F1 score on FICM compared to HC, and a 0.35% increase in FTIRM’s F1 score based on FICM. SLIP-Flood also achieves a maximum recall of 89.24% in image-text retrieval and shows promise in auxiliary flood text classification. Relevant resources are available at https://github.com/muhan-yy/SLIP-Flood.git.

查看原文本刊更多论文

滑动洪水：Swin变压器和轻量级语言图像预训练的软结合用于洪水图像分类

洪水监测是一项复杂的任务，涉及多模态数据挖掘和多任务协作。为了充分发挥多模态数据在洪水管理中的作用，在洪水灾害监测领域开展视觉语言预训练（visual-language pretraining， VLP），获得适合下游多个洪水相关任务的基础预训练模型是一个急需解决的问题。本文介绍了一种支持洪水图像分类、图像-文本检索和辅助文本分类的创新型VLP框架SLIP-Flood。为了克服现有的跨模态模型依赖小数据集和缺乏鲁棒性的局限性，我们首次构建了两个专门的数据集：1)洪水图像分类模型（FICM）的FloodMulS和洪水文本图像检索模型（FTIRM）的FloodIT。传统模型采用“硬分类策略”对图像进行分类，忽略了“分类歧义”的影响。为了提高性能，我们提出了“软分类策略”。此外，传统模型侧重于单峰（图像）信息，没有充分利用图像-文本联合信息。我们通过“软结合”来整合FICM和ftim，称为SCSC。实验结果表明，SCSC提高了SLIP-Flood的性能：与HC相比，SCSC在FICM上的F1分数提高了7.62%，FTIRM在FICM上的F1分数提高了0.35%。在图像-文本检索中，SLIP-Flood的最大查全率达到89.24%，在辅助洪水文本分类中也有很大的应用前景。相关资源可在https://github.com/muhan-yy/SLIP-Flood.git上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International journal of applied earth observation and geoinformation : ITC journal Global and Planetary Change, Management, Monitoring, Policy and Law, Earth-Surface Processes, Computers in Earth Sciences

CiteScore

12.00

自引率

0.00%

发文量

审稿时长

77 days

期刊介绍： The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.