Deep adaptive gradient-triplet hashing for cross-modal retrieval

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-06-11 DOI:10.1016/j.eswa.2025.128566

Congcong Zhu , Wei Hu , Jinkui Hou , Qibing Qin , Wenfeng Zhang , Lei Huang

{"title":"Deep adaptive gradient-triplet hashing for cross-modal retrieval","authors":"Congcong Zhu , Wei Hu , Jinkui Hou , Qibing Qin , Wenfeng Zhang , Lei Huang","doi":"10.1016/j.eswa.2025.128566","DOIUrl":null,"url":null,"abstract":"<div><div>Due to its low storage and high computing efficiency, deep cross-modal hashing has a wide application prospect in large-scale cross-modal retrieval. However, fixed gradients and manually set similarity margins in traditional triplet loss hinder the model’s ability to adapt to varying sample difficulties, leading to poor discrimination of hard negatives and degraded hash code quality, especially when positive-negative distances exceed the preset margin. In addition, most deep cross-modal hashing methods learn both similarity and quantization, and the interaction between the two can break the embedding, resulting in sub-optimal hash codes. In this paper, the Deep Adaptive Gradient-triplet Hashing (DAGtH) framework is proposed to embed heterogeneous modalities data into a discrimative discrete space and capture neighborhood relationships in primitive space. Specially, by setting suitable gradients to triples with different hardness, a new adaptive gradient-triplet loss is proposed to preserve the consistency of neighborhood relationships in original space, prompting intra-class compactness and inter-class separability of heterogeneous modalities. Meanwhile, by dividing the learning process into two parts, the Householder quantization loss is introduced into cross-modal retrieval to reduce lossy compression from the quantization. First, performing similarity learning in the embedding space. Second, the orthogonal transformation is optimized to reduce the distance between embedded and discrete binary code. To validate the effectiveness of our proposed DAGtH framework, comprehensive experiments are conducted on three benchmark datasets, and our approach achieves the increase of 0.61 %–13.8 % in mean average precision (mAP) at different bit lengths compared to the state-of-the-art hashing, which demonstrates that DAGtH achieves optimal retrieval performance. The code for our DAGtH framework can be found here: <span><span>https://github.com/QinLab-WFU/OUR-DAGtH</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"291 ","pages":"Article 128566"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425021852","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Due to its low storage and high computing efficiency, deep cross-modal hashing has a wide application prospect in large-scale cross-modal retrieval. However, fixed gradients and manually set similarity margins in traditional triplet loss hinder the model’s ability to adapt to varying sample difficulties, leading to poor discrimination of hard negatives and degraded hash code quality, especially when positive-negative distances exceed the preset margin. In addition, most deep cross-modal hashing methods learn both similarity and quantization, and the interaction between the two can break the embedding, resulting in sub-optimal hash codes. In this paper, the Deep Adaptive Gradient-triplet Hashing (DAGtH) framework is proposed to embed heterogeneous modalities data into a discrimative discrete space and capture neighborhood relationships in primitive space. Specially, by setting suitable gradients to triples with different hardness, a new adaptive gradient-triplet loss is proposed to preserve the consistency of neighborhood relationships in original space, prompting intra-class compactness and inter-class separability of heterogeneous modalities. Meanwhile, by dividing the learning process into two parts, the Householder quantization loss is introduced into cross-modal retrieval to reduce lossy compression from the quantization. First, performing similarity learning in the embedding space. Second, the orthogonal transformation is optimized to reduce the distance between embedded and discrete binary code. To validate the effectiveness of our proposed DAGtH framework, comprehensive experiments are conducted on three benchmark datasets, and our approach achieves the increase of 0.61 %–13.8 % in mean average precision (mAP) at different bit lengths compared to the state-of-the-art hashing, which demonstrates that DAGtH achieves optimal retrieval performance. The code for our DAGtH framework can be found here: https://github.com/QinLab-WFU/OUR-DAGtH.

查看原文本刊更多论文

跨模态检索的深度自适应梯度三元组哈希

由于其低存储和高计算效率，深度跨模态哈希在大规模跨模态检索中具有广泛的应用前景。然而，在传统的三重损失中，固定的梯度和手动设置的相似边界阻碍了模型适应不同样本难度的能力，导致硬负的识别能力差，降低了哈希码质量，特别是当正负距离超过预设的边界时。此外，大多数深度跨模态哈希方法同时学习相似性和量化，两者之间的交互会破坏嵌入，导致次优哈希码。本文提出了深度自适应梯度-三元散列（DAGtH）框架，将异构模态数据嵌入到判别离散空间中，并捕获原始空间中的邻域关系。特别地，通过对不同硬度的三元组设置合适的梯度，提出了一种新的自适应梯度-三元组损失，以保持原始空间中邻域关系的一致性，提高异构模态的类内紧性和类间可分性。同时，通过将学习过程分成两部分，将Householder量化损失引入到跨模态检索中，减少量化带来的有损压缩。首先，在嵌入空间中进行相似学习。其次，对正交变换进行优化，减小嵌入码与离散码之间的距离。为了验证我们提出的DAGtH框架的有效性，在三个基准数据集上进行了全面的实验，与最先进的哈希方法相比，我们的方法在不同比特长度下的平均精度（mAP）提高了0.61% - 13.8%，这表明DAGtH获得了最佳的检索性能。我们的DAGtH框架的代码可以在这里找到：https://github.com/QinLab-WFU/OUR-DAGtH。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.