Qing Zhang , Jing Zhang , Xiangdong Su , Yonghe Wang , Feilong Bao , Guanglai Gao
{"title":"Domain disentanglement and fusion based on hyperbolic neural networks for zero-shot sketch-based image retrieval","authors":"Qing Zhang , Jing Zhang , Xiangdong Su , Yonghe Wang , Feilong Bao , Guanglai Gao","doi":"10.1016/j.ipm.2024.103963","DOIUrl":null,"url":null,"abstract":"<div><div>With the advancement of zero-shot sketch-based image retrieval (ZS-SBIR) tasks, existing methods still encounter two major challenges: Euclidean space fails to effectively represent data with hierarchical structures, leading to non-discriminative retrieval features; relying solely on visual information is insufficient to align cross-domain features and maximize their domain generalization capabilities. To tackle these issues, this paper designs a hyperbolic neural networks based ZS-SBIR framework that considers domain disentanglement and fusion learning, called “DDFUS”. Specifically, we present a contrastive cross-modal learning method that guides the alignment of multi-domain visual representations with semantic representations in the hyperbolic space. This approach ensures that each visual representation possesses rich semantic hierarchical structure information. Furthermore, we propose a domain disentanglement method based on hyperbolic neural networks that employs paired hyperbolic encoders to decompose the representation of each domain into domain-invariant and domain-specific features to reduce information disturbance between domains. Moreover, we design an advanced cross-domain fusion method that promotes the fusion and exchange of multi-domain information through the reconstruction and generation of cross-domain samples. It significantly enhances the representation and generalization capabilities of domain-invariant features. Comprehensive experiments demonstrate that the mAP@all of our DDFUS model surpasses CNN-based models by 18.99 % on the Sketchy dataset, 1.93 % on the more difficult TU-Berlin dataset, and 11.4 % on the more challenging QuickDraw dataset.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 1","pages":"Article 103963"},"PeriodicalIF":7.4000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324003224","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
With the advancement of zero-shot sketch-based image retrieval (ZS-SBIR) tasks, existing methods still encounter two major challenges: Euclidean space fails to effectively represent data with hierarchical structures, leading to non-discriminative retrieval features; relying solely on visual information is insufficient to align cross-domain features and maximize their domain generalization capabilities. To tackle these issues, this paper designs a hyperbolic neural networks based ZS-SBIR framework that considers domain disentanglement and fusion learning, called “DDFUS”. Specifically, we present a contrastive cross-modal learning method that guides the alignment of multi-domain visual representations with semantic representations in the hyperbolic space. This approach ensures that each visual representation possesses rich semantic hierarchical structure information. Furthermore, we propose a domain disentanglement method based on hyperbolic neural networks that employs paired hyperbolic encoders to decompose the representation of each domain into domain-invariant and domain-specific features to reduce information disturbance between domains. Moreover, we design an advanced cross-domain fusion method that promotes the fusion and exchange of multi-domain information through the reconstruction and generation of cross-domain samples. It significantly enhances the representation and generalization capabilities of domain-invariant features. Comprehensive experiments demonstrate that the mAP@all of our DDFUS model surpasses CNN-based models by 18.99 % on the Sketchy dataset, 1.93 % on the more difficult TU-Berlin dataset, and 11.4 % on the more challenging QuickDraw dataset.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.