Shichang Fu , Tao Lu , Jiaming Wang , Yu Gu , Jiayi Cai , Kui Jiang
{"title":"GDPS:用于端到端人员搜索的通用蒸馏体系结构","authors":"Shichang Fu , Tao Lu , Jiaming Wang , Yu Gu , Jiayi Cai , Kui Jiang","doi":"10.1016/j.jvcir.2025.104468","DOIUrl":null,"url":null,"abstract":"<div><div>Existing knowledge distillation methods for person search tasks handle detection and re-identification (re-id) tasks separately, which may lead to feature conflicts between the two subtasks. On the one hand, by distilling only the detection task, the network will focus more on the common features of pedestrians, which may affect the performance of re-id. On the other hand, by distilling only the re-id task, the network will be more inclined to focus on the personality characteristics of pedestrians, which may harm the detection performance. To solve this problem, we propose a novel distillation method for person search tasks, treating person search as a single task and distilling different tasks in a unified framework, which is called <strong>G</strong>eneral <strong>D</strong>istillation for <strong>P</strong>erson <strong>S</strong>earch (GDPS). Specifically, we optimize the general features of detection and re-id by distilling feature-based knowledge, aiming for accurate localization of individuals. In addition, we focus on the re-id task and perform relationship-based and response-based knowledge distillation to obtain more discriminative person features. Finally, we integrate feature-based, relation-based and response-based knowledge into a general framework to achieve simultaneous distillation of two sub-tasks, which can be readily applied to various end-to-end person search methods. Extensive experiments demonstrate the effectiveness of GDPS across different one-step person search methods. Specifically, AlginPS with ResNet-50 achieves 94.1% in mAP with GDPS on the CUHK-SYSU dataset, which surpasses the baseline 93.1% by 1.0%, and is even better than the ResNet-50 DCN-based teacher model with 94.0% mAP.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104468"},"PeriodicalIF":2.6000,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GDPS: A general distillation architecture for end-to-end person search\",\"authors\":\"Shichang Fu , Tao Lu , Jiaming Wang , Yu Gu , Jiayi Cai , Kui Jiang\",\"doi\":\"10.1016/j.jvcir.2025.104468\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Existing knowledge distillation methods for person search tasks handle detection and re-identification (re-id) tasks separately, which may lead to feature conflicts between the two subtasks. On the one hand, by distilling only the detection task, the network will focus more on the common features of pedestrians, which may affect the performance of re-id. On the other hand, by distilling only the re-id task, the network will be more inclined to focus on the personality characteristics of pedestrians, which may harm the detection performance. To solve this problem, we propose a novel distillation method for person search tasks, treating person search as a single task and distilling different tasks in a unified framework, which is called <strong>G</strong>eneral <strong>D</strong>istillation for <strong>P</strong>erson <strong>S</strong>earch (GDPS). Specifically, we optimize the general features of detection and re-id by distilling feature-based knowledge, aiming for accurate localization of individuals. In addition, we focus on the re-id task and perform relationship-based and response-based knowledge distillation to obtain more discriminative person features. Finally, we integrate feature-based, relation-based and response-based knowledge into a general framework to achieve simultaneous distillation of two sub-tasks, which can be readily applied to various end-to-end person search methods. Extensive experiments demonstrate the effectiveness of GDPS across different one-step person search methods. Specifically, AlginPS with ResNet-50 achieves 94.1% in mAP with GDPS on the CUHK-SYSU dataset, which surpasses the baseline 93.1% by 1.0%, and is even better than the ResNet-50 DCN-based teacher model with 94.0% mAP.</div></div>\",\"PeriodicalId\":54755,\"journal\":{\"name\":\"Journal of Visual Communication and Image Representation\",\"volume\":\"110 \",\"pages\":\"Article 104468\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Visual Communication and Image Representation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1047320325000823\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325000823","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
摘要
现有的人员搜索任务的知识蒸馏方法将检测和重新识别(re-id)任务分开处理,这可能导致两个子任务之间的特征冲突。一方面,通过仅提取检测任务,网络将更多地关注行人的共同特征,这可能会影响re-id的性能。另一方面,如果只提取重身份任务,网络将更倾向于关注行人的个性特征,这可能会损害检测性能。为了解决这一问题,我们提出了一种新的人物搜索任务的提炼方法,将人物搜索作为一个单一的任务,将不同的任务提炼到一个统一的框架中,称为人物搜索的通用提炼(General distillation for person search, GDPS)。具体而言,我们通过提取基于特征的知识来优化检测和重新识别的一般特征,旨在准确定位个体。此外,我们还针对重识别任务进行了基于关系和基于响应的知识提炼,以获得更具判别性的人物特征。最后,我们将基于特征的、基于关系的和基于响应的知识整合到一个总体框架中,实现了两个子任务的同时升华,可以很容易地应用于各种端到端人员搜索方法。大量的实验证明了GDPS在不同的一步人搜索方法中的有效性。具体而言,基于ResNet-50的AlginPS在中大-中山数据集上的mAP with GDPS达到了94.1%,比基线的93.1%高出1.0%,甚至优于基于ResNet-50的dcn教师模型的94.0% mAP。
GDPS: A general distillation architecture for end-to-end person search
Existing knowledge distillation methods for person search tasks handle detection and re-identification (re-id) tasks separately, which may lead to feature conflicts between the two subtasks. On the one hand, by distilling only the detection task, the network will focus more on the common features of pedestrians, which may affect the performance of re-id. On the other hand, by distilling only the re-id task, the network will be more inclined to focus on the personality characteristics of pedestrians, which may harm the detection performance. To solve this problem, we propose a novel distillation method for person search tasks, treating person search as a single task and distilling different tasks in a unified framework, which is called General Distillation for Person Search (GDPS). Specifically, we optimize the general features of detection and re-id by distilling feature-based knowledge, aiming for accurate localization of individuals. In addition, we focus on the re-id task and perform relationship-based and response-based knowledge distillation to obtain more discriminative person features. Finally, we integrate feature-based, relation-based and response-based knowledge into a general framework to achieve simultaneous distillation of two sub-tasks, which can be readily applied to various end-to-end person search methods. Extensive experiments demonstrate the effectiveness of GDPS across different one-step person search methods. Specifically, AlginPS with ResNet-50 achieves 94.1% in mAP with GDPS on the CUHK-SYSU dataset, which surpasses the baseline 93.1% by 1.0%, and is even better than the ResNet-50 DCN-based teacher model with 94.0% mAP.
期刊介绍:
The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.