Token Aggregation and Selection Hashing for Efficient Underwater Image Retrieval

IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Shishi Qiao;Benqian Lin;Guanren Bu;Shuai Yuan;Haiyong Zheng
{"title":"Token Aggregation and Selection Hashing for Efficient Underwater Image Retrieval","authors":"Shishi Qiao;Benqian Lin;Guanren Bu;Shuai Yuan;Haiyong Zheng","doi":"10.1109/LSP.2025.3605283","DOIUrl":null,"url":null,"abstract":"The scarcity of large-scale annotated data poses significant challenges for underwater visual analysis. Deep hashing methods offer promising solutions for efficient large-scale image retrieval tasks due to their exceptional computational and storage efficiency. However, underwater images suffer from inherent degradation (e.g., low contrast, color distortion), complex background noise, and fine-grained semantic distinctions, severely hindering the discriminability of learned hash codes. To address these issues, we propose Token Aggregation and Selection Hashing (TASH), the first deep hashing framework specifically designed for underwater image retrieval. Built upon a teacher-student self-distillation Vision Transformer (ViT) architecture, TASH incorporates three key innovations: (1) An Underwater Image Augmentation (UIA) module that simulates realistic degradation patterns (e.g., color shifts) to augment the student branch’s input, explicitly enhancing model robustness to the diverse distortions encountered underwater; (2) A Multi-layer Token Aggregation (MTA) module that fuses features across layers, capturing hierarchical contextual information crucial for overcoming low contrast and resolving ambiguities in degraded underwater scenes; and (3) An Attention-based Token Selection (ATS) module that dynamically identifies and emphasizes the most discriminative tokens, eliminating the effect of background noise and enabling extracting subtle yet critical visual cues for distinguishing fine-grained underwater species. The resulting discriminative real-valued features are compressed into compact binary codes via a dedicated hash layer. Extensive experiments on two underwater datasets demonstrate that TASH significantly outperforms state-of-the-art methods, establishing new benchmarks for efficient and accurate underwater image retrieval.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3545-3549"},"PeriodicalIF":3.9000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11146755/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

The scarcity of large-scale annotated data poses significant challenges for underwater visual analysis. Deep hashing methods offer promising solutions for efficient large-scale image retrieval tasks due to their exceptional computational and storage efficiency. However, underwater images suffer from inherent degradation (e.g., low contrast, color distortion), complex background noise, and fine-grained semantic distinctions, severely hindering the discriminability of learned hash codes. To address these issues, we propose Token Aggregation and Selection Hashing (TASH), the first deep hashing framework specifically designed for underwater image retrieval. Built upon a teacher-student self-distillation Vision Transformer (ViT) architecture, TASH incorporates three key innovations: (1) An Underwater Image Augmentation (UIA) module that simulates realistic degradation patterns (e.g., color shifts) to augment the student branch’s input, explicitly enhancing model robustness to the diverse distortions encountered underwater; (2) A Multi-layer Token Aggregation (MTA) module that fuses features across layers, capturing hierarchical contextual information crucial for overcoming low contrast and resolving ambiguities in degraded underwater scenes; and (3) An Attention-based Token Selection (ATS) module that dynamically identifies and emphasizes the most discriminative tokens, eliminating the effect of background noise and enabling extracting subtle yet critical visual cues for distinguishing fine-grained underwater species. The resulting discriminative real-valued features are compressed into compact binary codes via a dedicated hash layer. Extensive experiments on two underwater datasets demonstrate that TASH significantly outperforms state-of-the-art methods, establishing new benchmarks for efficient and accurate underwater image retrieval.
基于令牌聚合和选择哈希的高效水下图像检索
大规模标注数据的稀缺性给水下视觉分析带来了重大挑战。深度哈希方法由于其出色的计算和存储效率,为高效的大规模图像检索任务提供了有前途的解决方案。然而,水下图像存在固有的退化(如对比度低、颜色失真)、复杂的背景噪声和细粒度的语义区分,严重阻碍了学习哈希码的可辨别性。为了解决这些问题,我们提出了令牌聚合和选择哈希(TASH),这是第一个专门为水下图像检索设计的深度哈希框架。TASH基于师生自升华视觉转换器(ViT)架构,包含三个关键创新:(1)水下图像增强(UIA)模块,模拟真实的退化模式(例如,颜色偏移),以增强学生分支的输入,明确增强模型对水下遇到的各种扭曲的鲁棒性;(2)多层令牌聚合(MTA)模块,融合各层特征,捕获分层上下文信息,这对于克服低对比度和解决退化水下场景中的歧义至关重要;(3)基于注意力的标记选择(Attention-based Token Selection, ATS)模块,动态识别和强调最具区别性的标记,消除背景噪声的影响,提取细微但关键的视觉线索,以区分细粒度的水下物种。产生的判别实值特征通过专用的哈希层压缩成紧凑的二进制代码。在两个水下数据集上进行的大量实验表明,TASH显著优于最先进的方法,为高效、准确的水下图像检索建立了新的基准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Signal Processing Letters
IEEE Signal Processing Letters 工程技术-工程:电子与电气
CiteScore
7.40
自引率
12.80%
发文量
339
审稿时长
2.8 months
期刊介绍: The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信