Jakob Hackstein;Gencer Sumbul;Kai Norman Clasen;Begüm Demir
{"title":"探索掩码自编码器在遥感图像检索中与传感器无关","authors":"Jakob Hackstein;Gencer Sumbul;Kai Norman Clasen;Begüm Demir","doi":"10.1109/TGRS.2024.3517150","DOIUrl":null,"url":null,"abstract":"Self-supervised learning through masked autoencoders (MAEs) has recently attracted great attention for remote sensing (RS) image representation learning (IRL), and thus embodies a significant potential for content-based image retrieval (CBIR) from ever-growing RS image archives. However, the existing MAE-based CBIR studies in RS assume that the considered RS images are acquired by a single image sensor, and thus are only suitable for unimodal CBIR problems. The effectiveness of MAEs for cross-sensor CBIR, which aims to search semantically similar images across different image modalities, has not been explored yet. In this article, we take the first step to explore the effectiveness of MAEs for sensor-agnostic CBIR in RS. To this end, we present a systematic overview on the possible adaptations of the vanilla MAE to exploit masked image modeling (MIM) on multisensor RS image archives [denoted as cross-sensor masked autoencoders [(CSMAEs)] in the context of CBIR. Based on different adjustments applied to the vanilla MAE, we introduce different CSMAE models. We also provide an extensive experimental analysis of these CSMAE models. We finally derive a guideline to exploit MIM for unimodal and cross-modal CBIR problems in RS. The code of this work is publicly available at \n<uri>https://github.com/jakhac/CSMAE</uri>\n.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-14"},"PeriodicalIF":8.6000,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring Masked Autoencoders for Sensor-Agnostic Image Retrieval in Remote Sensing\",\"authors\":\"Jakob Hackstein;Gencer Sumbul;Kai Norman Clasen;Begüm Demir\",\"doi\":\"10.1109/TGRS.2024.3517150\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Self-supervised learning through masked autoencoders (MAEs) has recently attracted great attention for remote sensing (RS) image representation learning (IRL), and thus embodies a significant potential for content-based image retrieval (CBIR) from ever-growing RS image archives. However, the existing MAE-based CBIR studies in RS assume that the considered RS images are acquired by a single image sensor, and thus are only suitable for unimodal CBIR problems. The effectiveness of MAEs for cross-sensor CBIR, which aims to search semantically similar images across different image modalities, has not been explored yet. In this article, we take the first step to explore the effectiveness of MAEs for sensor-agnostic CBIR in RS. To this end, we present a systematic overview on the possible adaptations of the vanilla MAE to exploit masked image modeling (MIM) on multisensor RS image archives [denoted as cross-sensor masked autoencoders [(CSMAEs)] in the context of CBIR. Based on different adjustments applied to the vanilla MAE, we introduce different CSMAE models. We also provide an extensive experimental analysis of these CSMAE models. We finally derive a guideline to exploit MIM for unimodal and cross-modal CBIR problems in RS. The code of this work is publicly available at \\n<uri>https://github.com/jakhac/CSMAE</uri>\\n.\",\"PeriodicalId\":13213,\"journal\":{\"name\":\"IEEE Transactions on Geoscience and Remote Sensing\",\"volume\":\"63 \",\"pages\":\"1-14\"},\"PeriodicalIF\":8.6000,\"publicationDate\":\"2024-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Geoscience and Remote Sensing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10798628/\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10798628/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Exploring Masked Autoencoders for Sensor-Agnostic Image Retrieval in Remote Sensing
Self-supervised learning through masked autoencoders (MAEs) has recently attracted great attention for remote sensing (RS) image representation learning (IRL), and thus embodies a significant potential for content-based image retrieval (CBIR) from ever-growing RS image archives. However, the existing MAE-based CBIR studies in RS assume that the considered RS images are acquired by a single image sensor, and thus are only suitable for unimodal CBIR problems. The effectiveness of MAEs for cross-sensor CBIR, which aims to search semantically similar images across different image modalities, has not been explored yet. In this article, we take the first step to explore the effectiveness of MAEs for sensor-agnostic CBIR in RS. To this end, we present a systematic overview on the possible adaptations of the vanilla MAE to exploit masked image modeling (MIM) on multisensor RS image archives [denoted as cross-sensor masked autoencoders [(CSMAEs)] in the context of CBIR. Based on different adjustments applied to the vanilla MAE, we introduce different CSMAE models. We also provide an extensive experimental analysis of these CSMAE models. We finally derive a guideline to exploit MIM for unimodal and cross-modal CBIR problems in RS. The code of this work is publicly available at
https://github.com/jakhac/CSMAE
.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.