Exploring Masked Autoencoders for Sensor-Agnostic Image Retrieval in Remote Sensing

IF 8.6 1区 地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Jakob Hackstein;Gencer Sumbul;Kai Norman Clasen;Begüm Demir
{"title":"Exploring Masked Autoencoders for Sensor-Agnostic Image Retrieval in Remote Sensing","authors":"Jakob Hackstein;Gencer Sumbul;Kai Norman Clasen;Begüm Demir","doi":"10.1109/TGRS.2024.3517150","DOIUrl":null,"url":null,"abstract":"Self-supervised learning through masked autoencoders (MAEs) has recently attracted great attention for remote sensing (RS) image representation learning (IRL), and thus embodies a significant potential for content-based image retrieval (CBIR) from ever-growing RS image archives. However, the existing MAE-based CBIR studies in RS assume that the considered RS images are acquired by a single image sensor, and thus are only suitable for unimodal CBIR problems. The effectiveness of MAEs for cross-sensor CBIR, which aims to search semantically similar images across different image modalities, has not been explored yet. In this article, we take the first step to explore the effectiveness of MAEs for sensor-agnostic CBIR in RS. To this end, we present a systematic overview on the possible adaptations of the vanilla MAE to exploit masked image modeling (MIM) on multisensor RS image archives [denoted as cross-sensor masked autoencoders [(CSMAEs)] in the context of CBIR. Based on different adjustments applied to the vanilla MAE, we introduce different CSMAE models. We also provide an extensive experimental analysis of these CSMAE models. We finally derive a guideline to exploit MIM for unimodal and cross-modal CBIR problems in RS. The code of this work is publicly available at \n<uri>https://github.com/jakhac/CSMAE</uri>\n.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-14"},"PeriodicalIF":8.6000,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10798628/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Self-supervised learning through masked autoencoders (MAEs) has recently attracted great attention for remote sensing (RS) image representation learning (IRL), and thus embodies a significant potential for content-based image retrieval (CBIR) from ever-growing RS image archives. However, the existing MAE-based CBIR studies in RS assume that the considered RS images are acquired by a single image sensor, and thus are only suitable for unimodal CBIR problems. The effectiveness of MAEs for cross-sensor CBIR, which aims to search semantically similar images across different image modalities, has not been explored yet. In this article, we take the first step to explore the effectiveness of MAEs for sensor-agnostic CBIR in RS. To this end, we present a systematic overview on the possible adaptations of the vanilla MAE to exploit masked image modeling (MIM) on multisensor RS image archives [denoted as cross-sensor masked autoencoders [(CSMAEs)] in the context of CBIR. Based on different adjustments applied to the vanilla MAE, we introduce different CSMAE models. We also provide an extensive experimental analysis of these CSMAE models. We finally derive a guideline to exploit MIM for unimodal and cross-modal CBIR problems in RS. The code of this work is publicly available at https://github.com/jakhac/CSMAE .
探索掩码自编码器在遥感图像检索中与传感器无关
基于掩膜自编码器(MAEs)的自监督学习最近在遥感图像表示学习(IRL)中引起了人们的广泛关注,因此在基于内容的遥感图像检索(CBIR)中体现了巨大的潜力。然而,现有的基于mae的遥感CBIR研究假设所考虑的遥感图像是由单个图像传感器获取的,因此只适用于单峰CBIR问题。MAEs用于跨传感器CBIR的有效性尚未得到探讨,该方法旨在跨不同图像模态搜索语义相似的图像。在本文中,我们首先探索MAEs在遥感中与传感器无关的CBIR中的有效性。为此,我们系统地概述了vanilla MAE在多传感器遥感图像档案(称为跨传感器掩码自编码器[CSMAEs])上利用掩码图像建模(MIM)的可能适应性。基于对香草MAE的不同调整,我们引入了不同的CSMAE模型。我们还对这些CSMAE模型进行了广泛的实验分析。我们最终得出了一个利用MIM解决RS中单峰和跨峰CBIR问题的指南。这项工作的代码可在https://github.com/jakhac/CSMAE上公开获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Geoscience and Remote Sensing
IEEE Transactions on Geoscience and Remote Sensing 工程技术-地球化学与地球物理
CiteScore
11.50
自引率
28.00%
发文量
1912
审稿时长
4.0 months
期刊介绍: IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信