跨模态检索的深度判别边界哈希

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-14 DOI:10.1109/TCSVT.2025.3570128

Qibing Qin;Yadong Huo;Wenfeng Zhang;Lei Huang;Jie Nie

{"title":"跨模态检索的深度判别边界哈希","authors":"Qibing Qin;Yadong Huo;Wenfeng Zhang;Lei Huang;Jie Nie","doi":"10.1109/TCSVT.2025.3570128","DOIUrl":null,"url":null,"abstract":"By the preferable efficiency in storage and computation, deep cross-modal has gained much attention in large-scale multimedia retrieval. Current deep hashing employs the probability outputs of the likelihood function, i.e., Sigmoid or Cauchy, to quantify the semantic similarity between samples in a common Hamming space. However, the inherent weakness of the Sigmoid likelihood function or the Cauchy likelihood function in gradient optimization leads to hashing models failing to exactly describe the hamming ball, which indicates the absolute semantic boundary among classes, thereby giving the high neighborhood ambiguity. In this paper, with the analysis of the likelihood function from the perspective of similarity metric learning, the novel Deep Discriminative Boundary Hashing framework (DDBH) is proposed to learn the discriminative embedding space that separates neighbors and non-neighbors well. Specifically, by introducing the remapping strategy and the base-point adaptive selection, the boundary-preserving loss based on the adjustable likelihood function is proposed to project data points with small gradients to regions with large gradients and give larger gradients for hard samples, facilitating better separation among classes. Meanwhile, to learn class-dependent binary codes, the class-wise quantization loss is designed to heuristically transfer class-wise prior knowledge to the binary quantization, significantly improving the discriminative capability of compact discrete codes. Comprehensive experiments on three benchmark datasets show that our proposed DDBH framework outperforms other representative deep cross-modal hashing. The corresponding code is available at <uri>https://github.com/QinLab-WFU/DDBH</uri>","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 10","pages":"10557-10570"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Discriminative Boundary Hashing for Cross-Modal Retrieval\",\"authors\":\"Qibing Qin;Yadong Huo;Wenfeng Zhang;Lei Huang;Jie Nie\",\"doi\":\"10.1109/TCSVT.2025.3570128\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"By the preferable efficiency in storage and computation, deep cross-modal has gained much attention in large-scale multimedia retrieval. Current deep hashing employs the probability outputs of the likelihood function, i.e., Sigmoid or Cauchy, to quantify the semantic similarity between samples in a common Hamming space. However, the inherent weakness of the Sigmoid likelihood function or the Cauchy likelihood function in gradient optimization leads to hashing models failing to exactly describe the hamming ball, which indicates the absolute semantic boundary among classes, thereby giving the high neighborhood ambiguity. In this paper, with the analysis of the likelihood function from the perspective of similarity metric learning, the novel Deep Discriminative Boundary Hashing framework (DDBH) is proposed to learn the discriminative embedding space that separates neighbors and non-neighbors well. Specifically, by introducing the remapping strategy and the base-point adaptive selection, the boundary-preserving loss based on the adjustable likelihood function is proposed to project data points with small gradients to regions with large gradients and give larger gradients for hard samples, facilitating better separation among classes. Meanwhile, to learn class-dependent binary codes, the class-wise quantization loss is designed to heuristically transfer class-wise prior knowledge to the binary quantization, significantly improving the discriminative capability of compact discrete codes. Comprehensive experiments on three benchmark datasets show that our proposed DDBH framework outperforms other representative deep cross-modal hashing. The corresponding code is available at <uri>https://github.com/QinLab-WFU/DDBH</uri>\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 10\",\"pages\":\"10557-10570\"},\"PeriodicalIF\":11.1000,\"publicationDate\":\"2025-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11003934/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11003934/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

由于具有较好的存储和计算效率，深交叉模态在大规模多媒体检索中受到越来越多的关注。当前的深度哈希算法采用似然函数的概率输出，即Sigmoid或Cauchy，来量化一个公共汉明空间中样本之间的语义相似度。然而，由于Sigmoid似然函数或Cauchy似然函数在梯度优化中的固有弱点，导致哈希模型无法准确描述表明类间绝对语义边界的汉明球，从而产生高邻域歧义。本文从相似性度量学习的角度对似然函数进行了分析，提出了一种新的深度判别边界哈希框架（Deep Discriminative Boundary hash framework， DDBH）来学习区分邻域和非邻域的判别嵌入空间。具体而言，通过引入重映射策略和基点自适应选择，提出了基于可调似然函数的保边损失，将梯度小的数据点投影到梯度大的区域，并对硬样本给出较大的梯度，从而更好地实现类间分离。同时，为了学习类相关的二进制码，设计了类智能量化损失，启发式地将类智能先验知识转移到二进制量化中，显著提高了紧凑离散码的判别能力。在三个基准数据集上的综合实验表明，我们提出的DDBH框架优于其他具有代表性的深度跨模态哈希。相应的代码可在https://github.com/QinLab-WFU/DDBH上获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep Discriminative Boundary Hashing for Cross-Modal Retrieval

By the preferable efficiency in storage and computation, deep cross-modal has gained much attention in large-scale multimedia retrieval. Current deep hashing employs the probability outputs of the likelihood function, i.e., Sigmoid or Cauchy, to quantify the semantic similarity between samples in a common Hamming space. However, the inherent weakness of the Sigmoid likelihood function or the Cauchy likelihood function in gradient optimization leads to hashing models failing to exactly describe the hamming ball, which indicates the absolute semantic boundary among classes, thereby giving the high neighborhood ambiguity. In this paper, with the analysis of the likelihood function from the perspective of similarity metric learning, the novel Deep Discriminative Boundary Hashing framework (DDBH) is proposed to learn the discriminative embedding space that separates neighbors and non-neighbors well. Specifically, by introducing the remapping strategy and the base-point adaptive selection, the boundary-preserving loss based on the adjustable likelihood function is proposed to project data points with small gradients to regions with large gradients and give larger gradients for hard samples, facilitating better separation among classes. Meanwhile, to learn class-dependent binary codes, the class-wise quantization loss is designed to heuristically transfer class-wise prior knowledge to the binary quantization, significantly improving the discriminative capability of compact discrete codes. Comprehensive experiments on three benchmark datasets show that our proposed DDBH framework outperforms other representative deep cross-modal hashing. The corresponding code is available at https://github.com/QinLab-WFU/DDBH

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.