{"title":"基于令牌聚合和选择哈希的高效水下图像检索","authors":"Shishi Qiao;Benqian Lin;Guanren Bu;Shuai Yuan;Haiyong Zheng","doi":"10.1109/LSP.2025.3605283","DOIUrl":null,"url":null,"abstract":"The scarcity of large-scale annotated data poses significant challenges for underwater visual analysis. Deep hashing methods offer promising solutions for efficient large-scale image retrieval tasks due to their exceptional computational and storage efficiency. However, underwater images suffer from inherent degradation (e.g., low contrast, color distortion), complex background noise, and fine-grained semantic distinctions, severely hindering the discriminability of learned hash codes. To address these issues, we propose Token Aggregation and Selection Hashing (TASH), the first deep hashing framework specifically designed for underwater image retrieval. Built upon a teacher-student self-distillation Vision Transformer (ViT) architecture, TASH incorporates three key innovations: (1) An Underwater Image Augmentation (UIA) module that simulates realistic degradation patterns (e.g., color shifts) to augment the student branch’s input, explicitly enhancing model robustness to the diverse distortions encountered underwater; (2) A Multi-layer Token Aggregation (MTA) module that fuses features across layers, capturing hierarchical contextual information crucial for overcoming low contrast and resolving ambiguities in degraded underwater scenes; and (3) An Attention-based Token Selection (ATS) module that dynamically identifies and emphasizes the most discriminative tokens, eliminating the effect of background noise and enabling extracting subtle yet critical visual cues for distinguishing fine-grained underwater species. The resulting discriminative real-valued features are compressed into compact binary codes via a dedicated hash layer. Extensive experiments on two underwater datasets demonstrate that TASH significantly outperforms state-of-the-art methods, establishing new benchmarks for efficient and accurate underwater image retrieval.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3545-3549"},"PeriodicalIF":3.9000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Token Aggregation and Selection Hashing for Efficient Underwater Image Retrieval\",\"authors\":\"Shishi Qiao;Benqian Lin;Guanren Bu;Shuai Yuan;Haiyong Zheng\",\"doi\":\"10.1109/LSP.2025.3605283\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The scarcity of large-scale annotated data poses significant challenges for underwater visual analysis. Deep hashing methods offer promising solutions for efficient large-scale image retrieval tasks due to their exceptional computational and storage efficiency. However, underwater images suffer from inherent degradation (e.g., low contrast, color distortion), complex background noise, and fine-grained semantic distinctions, severely hindering the discriminability of learned hash codes. To address these issues, we propose Token Aggregation and Selection Hashing (TASH), the first deep hashing framework specifically designed for underwater image retrieval. Built upon a teacher-student self-distillation Vision Transformer (ViT) architecture, TASH incorporates three key innovations: (1) An Underwater Image Augmentation (UIA) module that simulates realistic degradation patterns (e.g., color shifts) to augment the student branch’s input, explicitly enhancing model robustness to the diverse distortions encountered underwater; (2) A Multi-layer Token Aggregation (MTA) module that fuses features across layers, capturing hierarchical contextual information crucial for overcoming low contrast and resolving ambiguities in degraded underwater scenes; and (3) An Attention-based Token Selection (ATS) module that dynamically identifies and emphasizes the most discriminative tokens, eliminating the effect of background noise and enabling extracting subtle yet critical visual cues for distinguishing fine-grained underwater species. The resulting discriminative real-valued features are compressed into compact binary codes via a dedicated hash layer. Extensive experiments on two underwater datasets demonstrate that TASH significantly outperforms state-of-the-art methods, establishing new benchmarks for efficient and accurate underwater image retrieval.\",\"PeriodicalId\":13154,\"journal\":{\"name\":\"IEEE Signal Processing Letters\",\"volume\":\"32 \",\"pages\":\"3545-3549\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Signal Processing Letters\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11146755/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11146755/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Token Aggregation and Selection Hashing for Efficient Underwater Image Retrieval
The scarcity of large-scale annotated data poses significant challenges for underwater visual analysis. Deep hashing methods offer promising solutions for efficient large-scale image retrieval tasks due to their exceptional computational and storage efficiency. However, underwater images suffer from inherent degradation (e.g., low contrast, color distortion), complex background noise, and fine-grained semantic distinctions, severely hindering the discriminability of learned hash codes. To address these issues, we propose Token Aggregation and Selection Hashing (TASH), the first deep hashing framework specifically designed for underwater image retrieval. Built upon a teacher-student self-distillation Vision Transformer (ViT) architecture, TASH incorporates three key innovations: (1) An Underwater Image Augmentation (UIA) module that simulates realistic degradation patterns (e.g., color shifts) to augment the student branch’s input, explicitly enhancing model robustness to the diverse distortions encountered underwater; (2) A Multi-layer Token Aggregation (MTA) module that fuses features across layers, capturing hierarchical contextual information crucial for overcoming low contrast and resolving ambiguities in degraded underwater scenes; and (3) An Attention-based Token Selection (ATS) module that dynamically identifies and emphasizes the most discriminative tokens, eliminating the effect of background noise and enabling extracting subtle yet critical visual cues for distinguishing fine-grained underwater species. The resulting discriminative real-valued features are compressed into compact binary codes via a dedicated hash layer. Extensive experiments on two underwater datasets demonstrate that TASH significantly outperforms state-of-the-art methods, establishing new benchmarks for efficient and accurate underwater image retrieval.
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.