Qibing Qin;Yadong Huo;Wenfeng Zhang;Lei Huang;Jie Nie
{"title":"跨模态检索的深度判别边界哈希","authors":"Qibing Qin;Yadong Huo;Wenfeng Zhang;Lei Huang;Jie Nie","doi":"10.1109/TCSVT.2025.3570128","DOIUrl":null,"url":null,"abstract":"By the preferable efficiency in storage and computation, deep cross-modal has gained much attention in large-scale multimedia retrieval. Current deep hashing employs the probability outputs of the likelihood function, i.e., Sigmoid or Cauchy, to quantify the semantic similarity between samples in a common Hamming space. However, the inherent weakness of the Sigmoid likelihood function or the Cauchy likelihood function in gradient optimization leads to hashing models failing to exactly describe the hamming ball, which indicates the absolute semantic boundary among classes, thereby giving the high neighborhood ambiguity. In this paper, with the analysis of the likelihood function from the perspective of similarity metric learning, the novel Deep Discriminative Boundary Hashing framework (DDBH) is proposed to learn the discriminative embedding space that separates neighbors and non-neighbors well. Specifically, by introducing the remapping strategy and the base-point adaptive selection, the boundary-preserving loss based on the adjustable likelihood function is proposed to project data points with small gradients to regions with large gradients and give larger gradients for hard samples, facilitating better separation among classes. Meanwhile, to learn class-dependent binary codes, the class-wise quantization loss is designed to heuristically transfer class-wise prior knowledge to the binary quantization, significantly improving the discriminative capability of compact discrete codes. Comprehensive experiments on three benchmark datasets show that our proposed DDBH framework outperforms other representative deep cross-modal hashing. The corresponding code is available at <uri>https://github.com/QinLab-WFU/DDBH</uri>","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 10","pages":"10557-10570"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Discriminative Boundary Hashing for Cross-Modal Retrieval\",\"authors\":\"Qibing Qin;Yadong Huo;Wenfeng Zhang;Lei Huang;Jie Nie\",\"doi\":\"10.1109/TCSVT.2025.3570128\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"By the preferable efficiency in storage and computation, deep cross-modal has gained much attention in large-scale multimedia retrieval. Current deep hashing employs the probability outputs of the likelihood function, i.e., Sigmoid or Cauchy, to quantify the semantic similarity between samples in a common Hamming space. However, the inherent weakness of the Sigmoid likelihood function or the Cauchy likelihood function in gradient optimization leads to hashing models failing to exactly describe the hamming ball, which indicates the absolute semantic boundary among classes, thereby giving the high neighborhood ambiguity. In this paper, with the analysis of the likelihood function from the perspective of similarity metric learning, the novel Deep Discriminative Boundary Hashing framework (DDBH) is proposed to learn the discriminative embedding space that separates neighbors and non-neighbors well. Specifically, by introducing the remapping strategy and the base-point adaptive selection, the boundary-preserving loss based on the adjustable likelihood function is proposed to project data points with small gradients to regions with large gradients and give larger gradients for hard samples, facilitating better separation among classes. Meanwhile, to learn class-dependent binary codes, the class-wise quantization loss is designed to heuristically transfer class-wise prior knowledge to the binary quantization, significantly improving the discriminative capability of compact discrete codes. Comprehensive experiments on three benchmark datasets show that our proposed DDBH framework outperforms other representative deep cross-modal hashing. The corresponding code is available at <uri>https://github.com/QinLab-WFU/DDBH</uri>\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 10\",\"pages\":\"10557-10570\"},\"PeriodicalIF\":11.1000,\"publicationDate\":\"2025-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11003934/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11003934/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Deep Discriminative Boundary Hashing for Cross-Modal Retrieval
By the preferable efficiency in storage and computation, deep cross-modal has gained much attention in large-scale multimedia retrieval. Current deep hashing employs the probability outputs of the likelihood function, i.e., Sigmoid or Cauchy, to quantify the semantic similarity between samples in a common Hamming space. However, the inherent weakness of the Sigmoid likelihood function or the Cauchy likelihood function in gradient optimization leads to hashing models failing to exactly describe the hamming ball, which indicates the absolute semantic boundary among classes, thereby giving the high neighborhood ambiguity. In this paper, with the analysis of the likelihood function from the perspective of similarity metric learning, the novel Deep Discriminative Boundary Hashing framework (DDBH) is proposed to learn the discriminative embedding space that separates neighbors and non-neighbors well. Specifically, by introducing the remapping strategy and the base-point adaptive selection, the boundary-preserving loss based on the adjustable likelihood function is proposed to project data points with small gradients to regions with large gradients and give larger gradients for hard samples, facilitating better separation among classes. Meanwhile, to learn class-dependent binary codes, the class-wise quantization loss is designed to heuristically transfer class-wise prior knowledge to the binary quantization, significantly improving the discriminative capability of compact discrete codes. Comprehensive experiments on three benchmark datasets show that our proposed DDBH framework outperforms other representative deep cross-modal hashing. The corresponding code is available at https://github.com/QinLab-WFU/DDBH
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.