Congcong Zhu , Wei Hu , Jinkui Hou , Qibing Qin , Wenfeng Zhang , Lei Huang
{"title":"Deep adaptive gradient-triplet hashing for cross-modal retrieval","authors":"Congcong Zhu , Wei Hu , Jinkui Hou , Qibing Qin , Wenfeng Zhang , Lei Huang","doi":"10.1016/j.eswa.2025.128566","DOIUrl":null,"url":null,"abstract":"<div><div>Due to its low storage and high computing efficiency, deep cross-modal hashing has a wide application prospect in large-scale cross-modal retrieval. However, fixed gradients and manually set similarity margins in traditional triplet loss hinder the model’s ability to adapt to varying sample difficulties, leading to poor discrimination of hard negatives and degraded hash code quality, especially when positive-negative distances exceed the preset margin. In addition, most deep cross-modal hashing methods learn both similarity and quantization, and the interaction between the two can break the embedding, resulting in sub-optimal hash codes. In this paper, the Deep Adaptive Gradient-triplet Hashing (DAGtH) framework is proposed to embed heterogeneous modalities data into a discrimative discrete space and capture neighborhood relationships in primitive space. Specially, by setting suitable gradients to triples with different hardness, a new adaptive gradient-triplet loss is proposed to preserve the consistency of neighborhood relationships in original space, prompting intra-class compactness and inter-class separability of heterogeneous modalities. Meanwhile, by dividing the learning process into two parts, the Householder quantization loss is introduced into cross-modal retrieval to reduce lossy compression from the quantization. First, performing similarity learning in the embedding space. Second, the orthogonal transformation is optimized to reduce the distance between embedded and discrete binary code. To validate the effectiveness of our proposed DAGtH framework, comprehensive experiments are conducted on three benchmark datasets, and our approach achieves the increase of 0.61 %–13.8 % in mean average precision (mAP) at different bit lengths compared to the state-of-the-art hashing, which demonstrates that DAGtH achieves optimal retrieval performance. The code for our DAGtH framework can be found here: <span><span>https://github.com/QinLab-WFU/OUR-DAGtH</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"291 ","pages":"Article 128566"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425021852","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Due to its low storage and high computing efficiency, deep cross-modal hashing has a wide application prospect in large-scale cross-modal retrieval. However, fixed gradients and manually set similarity margins in traditional triplet loss hinder the model’s ability to adapt to varying sample difficulties, leading to poor discrimination of hard negatives and degraded hash code quality, especially when positive-negative distances exceed the preset margin. In addition, most deep cross-modal hashing methods learn both similarity and quantization, and the interaction between the two can break the embedding, resulting in sub-optimal hash codes. In this paper, the Deep Adaptive Gradient-triplet Hashing (DAGtH) framework is proposed to embed heterogeneous modalities data into a discrimative discrete space and capture neighborhood relationships in primitive space. Specially, by setting suitable gradients to triples with different hardness, a new adaptive gradient-triplet loss is proposed to preserve the consistency of neighborhood relationships in original space, prompting intra-class compactness and inter-class separability of heterogeneous modalities. Meanwhile, by dividing the learning process into two parts, the Householder quantization loss is introduced into cross-modal retrieval to reduce lossy compression from the quantization. First, performing similarity learning in the embedding space. Second, the orthogonal transformation is optimized to reduce the distance between embedded and discrete binary code. To validate the effectiveness of our proposed DAGtH framework, comprehensive experiments are conducted on three benchmark datasets, and our approach achieves the increase of 0.61 %–13.8 % in mean average precision (mAP) at different bit lengths compared to the state-of-the-art hashing, which demonstrates that DAGtH achieves optimal retrieval performance. The code for our DAGtH framework can be found here: https://github.com/QinLab-WFU/OUR-DAGtH.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.