Kai Wang, Yifan Wang, Xing Xu, Xin Liu, Weihua Ou, Huimin Lu
{"title":"基于原型的选择性知识精馏零拍草图图像检索","authors":"Kai Wang, Yifan Wang, Xing Xu, Xin Liu, Weihua Ou, Huimin Lu","doi":"10.1145/3503161.3548382","DOIUrl":null,"url":null,"abstract":"Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is an emerging research task that aims to retrieve data of new classes across sketches and images. It is challenging due to the heterogeneous distributions and the inconsistent semantics across seen and unseen classes of the cross-modal data of sketches and images. To realize knowledge transfer, the latest approaches introduce knowledge distillation, which optimizes the student network through the teacher signal distilled from the teacher network pre-trained on large-scale datasets. However, these methods often ignore the mispredictions of the teacher signal, which may make the model vulnerable when disturbed by the wrong output of the teacher network. To tackle the above issues, we propose a novel method termed Prototype-based Selective Knowledge Distillation (PSKD) for ZS-SBIR. Our PSKD method first learns a set of prototypes to represent categories and then utilizes an instance-level adaptive learning strategy to strengthen semantic relations between categories. Afterwards, a correlation matrix targeted for the downstream task is established through the prototypes. With the learned correlation matrix, the teacher signal given by transformers pre-trained on ImageNet and fine-tuned on the downstream dataset, can be reconstructed to weaken the impact of mispredictions and selectively distill knowledge on the student network. Extensive experiments conducted on three widely-used datasets demonstrate that the proposed PSKD method establishes the new state-of-the-art performance on all datasets for ZS-SBIR.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Prototype-based Selective Knowledge Distillation for Zero-Shot Sketch Based Image Retrieval\",\"authors\":\"Kai Wang, Yifan Wang, Xing Xu, Xin Liu, Weihua Ou, Huimin Lu\",\"doi\":\"10.1145/3503161.3548382\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is an emerging research task that aims to retrieve data of new classes across sketches and images. It is challenging due to the heterogeneous distributions and the inconsistent semantics across seen and unseen classes of the cross-modal data of sketches and images. To realize knowledge transfer, the latest approaches introduce knowledge distillation, which optimizes the student network through the teacher signal distilled from the teacher network pre-trained on large-scale datasets. However, these methods often ignore the mispredictions of the teacher signal, which may make the model vulnerable when disturbed by the wrong output of the teacher network. To tackle the above issues, we propose a novel method termed Prototype-based Selective Knowledge Distillation (PSKD) for ZS-SBIR. Our PSKD method first learns a set of prototypes to represent categories and then utilizes an instance-level adaptive learning strategy to strengthen semantic relations between categories. Afterwards, a correlation matrix targeted for the downstream task is established through the prototypes. With the learned correlation matrix, the teacher signal given by transformers pre-trained on ImageNet and fine-tuned on the downstream dataset, can be reconstructed to weaken the impact of mispredictions and selectively distill knowledge on the student network. Extensive experiments conducted on three widely-used datasets demonstrate that the proposed PSKD method establishes the new state-of-the-art performance on all datasets for ZS-SBIR.\",\"PeriodicalId\":412792,\"journal\":{\"name\":\"Proceedings of the 30th ACM International Conference on Multimedia\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 30th ACM International Conference on Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3503161.3548382\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3503161.3548382","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Prototype-based Selective Knowledge Distillation for Zero-Shot Sketch Based Image Retrieval
Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is an emerging research task that aims to retrieve data of new classes across sketches and images. It is challenging due to the heterogeneous distributions and the inconsistent semantics across seen and unseen classes of the cross-modal data of sketches and images. To realize knowledge transfer, the latest approaches introduce knowledge distillation, which optimizes the student network through the teacher signal distilled from the teacher network pre-trained on large-scale datasets. However, these methods often ignore the mispredictions of the teacher signal, which may make the model vulnerable when disturbed by the wrong output of the teacher network. To tackle the above issues, we propose a novel method termed Prototype-based Selective Knowledge Distillation (PSKD) for ZS-SBIR. Our PSKD method first learns a set of prototypes to represent categories and then utilizes an instance-level adaptive learning strategy to strengthen semantic relations between categories. Afterwards, a correlation matrix targeted for the downstream task is established through the prototypes. With the learned correlation matrix, the teacher signal given by transformers pre-trained on ImageNet and fine-tuned on the downstream dataset, can be reconstructed to weaken the impact of mispredictions and selectively distill knowledge on the student network. Extensive experiments conducted on three widely-used datasets demonstrate that the proposed PSKD method establishes the new state-of-the-art performance on all datasets for ZS-SBIR.