Token Aggregation and Selection Hashing for Efficient Underwater Image Retrieval

IF 3.9 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters Pub Date : 2025-09-02 DOI:10.1109/LSP.2025.3605283

Shishi Qiao;Benqian Lin;Guanren Bu;Shuai Yuan;Haiyong Zheng

{"title":"Token Aggregation and Selection Hashing for Efficient Underwater Image Retrieval","authors":"Shishi Qiao;Benqian Lin;Guanren Bu;Shuai Yuan;Haiyong Zheng","doi":"10.1109/LSP.2025.3605283","DOIUrl":null,"url":null,"abstract":"The scarcity of large-scale annotated data poses significant challenges for underwater visual analysis. Deep hashing methods offer promising solutions for efficient large-scale image retrieval tasks due to their exceptional computational and storage efficiency. However, underwater images suffer from inherent degradation (e.g., low contrast, color distortion), complex background noise, and fine-grained semantic distinctions, severely hindering the discriminability of learned hash codes. To address these issues, we propose Token Aggregation and Selection Hashing (TASH), the first deep hashing framework specifically designed for underwater image retrieval. Built upon a teacher-student self-distillation Vision Transformer (ViT) architecture, TASH incorporates three key innovations: (1) An Underwater Image Augmentation (UIA) module that simulates realistic degradation patterns (e.g., color shifts) to augment the student branch’s input, explicitly enhancing model robustness to the diverse distortions encountered underwater; (2) A Multi-layer Token Aggregation (MTA) module that fuses features across layers, capturing hierarchical contextual information crucial for overcoming low contrast and resolving ambiguities in degraded underwater scenes; and (3) An Attention-based Token Selection (ATS) module that dynamically identifies and emphasizes the most discriminative tokens, eliminating the effect of background noise and enabling extracting subtle yet critical visual cues for distinguishing fine-grained underwater species. The resulting discriminative real-valued features are compressed into compact binary codes via a dedicated hash layer. Extensive experiments on two underwater datasets demonstrate that TASH significantly outperforms state-of-the-art methods, establishing new benchmarks for efficient and accurate underwater image retrieval.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3545-3549"},"PeriodicalIF":3.9000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11146755/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

The scarcity of large-scale annotated data poses significant challenges for underwater visual analysis. Deep hashing methods offer promising solutions for efficient large-scale image retrieval tasks due to their exceptional computational and storage efficiency. However, underwater images suffer from inherent degradation (e.g., low contrast, color distortion), complex background noise, and fine-grained semantic distinctions, severely hindering the discriminability of learned hash codes. To address these issues, we propose Token Aggregation and Selection Hashing (TASH), the first deep hashing framework specifically designed for underwater image retrieval. Built upon a teacher-student self-distillation Vision Transformer (ViT) architecture, TASH incorporates three key innovations: (1) An Underwater Image Augmentation (UIA) module that simulates realistic degradation patterns (e.g., color shifts) to augment the student branch’s input, explicitly enhancing model robustness to the diverse distortions encountered underwater; (2) A Multi-layer Token Aggregation (MTA) module that fuses features across layers, capturing hierarchical contextual information crucial for overcoming low contrast and resolving ambiguities in degraded underwater scenes; and (3) An Attention-based Token Selection (ATS) module that dynamically identifies and emphasizes the most discriminative tokens, eliminating the effect of background noise and enabling extracting subtle yet critical visual cues for distinguishing fine-grained underwater species. The resulting discriminative real-valued features are compressed into compact binary codes via a dedicated hash layer. Extensive experiments on two underwater datasets demonstrate that TASH significantly outperforms state-of-the-art methods, establishing new benchmarks for efficient and accurate underwater image retrieval.

查看原文本刊更多论文

基于令牌聚合和选择哈希的高效水下图像检索

大规模标注数据的稀缺性给水下视觉分析带来了重大挑战。深度哈希方法由于其出色的计算和存储效率，为高效的大规模图像检索任务提供了有前途的解决方案。然而，水下图像存在固有的退化（如对比度低、颜色失真）、复杂的背景噪声和细粒度的语义区分，严重阻碍了学习哈希码的可辨别性。为了解决这些问题，我们提出了令牌聚合和选择哈希（TASH），这是第一个专门为水下图像检索设计的深度哈希框架。TASH基于师生自升华视觉转换器（ViT）架构，包含三个关键创新：(1)水下图像增强（UIA）模块，模拟真实的退化模式（例如，颜色偏移），以增强学生分支的输入，明确增强模型对水下遇到的各种扭曲的鲁棒性；(2)多层令牌聚合（MTA）模块，融合各层特征，捕获分层上下文信息，这对于克服低对比度和解决退化水下场景中的歧义至关重要；(3)基于注意力的标记选择（Attention-based Token Selection， ATS）模块，动态识别和强调最具区别性的标记，消除背景噪声的影响，提取细微但关键的视觉线索，以区分细粒度的水下物种。产生的判别实值特征通过专用的哈希层压缩成紧凑的二进制代码。在两个水下数据集上进行的大量实验表明，TASH显著优于最先进的方法，为高效、准确的水下图像检索建立了新的基准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Signal Processing Letters 工程技术-工程：电子与电气

CiteScore

7.40

自引率

12.80%

发文量

339

审稿时长

2.8 months

期刊介绍： The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.