Wen Wang, Zheyuan Lin, Shanshan Ji, Te Li, J. Gu, Minhong Wan, Chunlong Zhang
{"title":"降低人员重新识别变压器的计算成本","authors":"Wen Wang, Zheyuan Lin, Shanshan Ji, Te Li, J. Gu, Minhong Wan, Chunlong Zhang","doi":"10.1109/ROBIO58561.2023.10354731","DOIUrl":null,"url":null,"abstract":"Transformer-based visual technologies have witnessed remarkable progress in recent years, and person re-identification (ReID) is one of the active research areas that adopts transformers to improve the performance. However, a major challenge of applying transformers to ReID is the high computational cost, which hinders the real-time deployment of such methods. To address this issue, this paper proposes two simple yet effective techniques to reduce the computation of transformers for ReID. The first technique is to eliminate the invalid patches that do not contain any person information, thereby reducing the number of tokens fed into the transformer. Considering that computational complexity is quadratic with respect to input tokens, the second technique partitions the image into multiple windows, applies separate transformers to each window, and merges class tokens from each window, which can reduce the complexity of the self-attention mechanism. By combining these two techniques, our proposed method reduces the SOTA baseline model by 12.2% FLOPs, while slightly improving the rank-1 accuracy and only sacrificing 1.1% mAP on DukeMTMC-ReID dataset.","PeriodicalId":505134,"journal":{"name":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"69 11","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reducing the Computational Cost of Transformers for Person Re-identification\",\"authors\":\"Wen Wang, Zheyuan Lin, Shanshan Ji, Te Li, J. Gu, Minhong Wan, Chunlong Zhang\",\"doi\":\"10.1109/ROBIO58561.2023.10354731\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transformer-based visual technologies have witnessed remarkable progress in recent years, and person re-identification (ReID) is one of the active research areas that adopts transformers to improve the performance. However, a major challenge of applying transformers to ReID is the high computational cost, which hinders the real-time deployment of such methods. To address this issue, this paper proposes two simple yet effective techniques to reduce the computation of transformers for ReID. The first technique is to eliminate the invalid patches that do not contain any person information, thereby reducing the number of tokens fed into the transformer. Considering that computational complexity is quadratic with respect to input tokens, the second technique partitions the image into multiple windows, applies separate transformers to each window, and merges class tokens from each window, which can reduce the complexity of the self-attention mechanism. By combining these two techniques, our proposed method reduces the SOTA baseline model by 12.2% FLOPs, while slightly improving the rank-1 accuracy and only sacrificing 1.1% mAP on DukeMTMC-ReID dataset.\",\"PeriodicalId\":505134,\"journal\":{\"name\":\"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)\",\"volume\":\"69 11\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ROBIO58561.2023.10354731\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROBIO58561.2023.10354731","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Reducing the Computational Cost of Transformers for Person Re-identification
Transformer-based visual technologies have witnessed remarkable progress in recent years, and person re-identification (ReID) is one of the active research areas that adopts transformers to improve the performance. However, a major challenge of applying transformers to ReID is the high computational cost, which hinders the real-time deployment of such methods. To address this issue, this paper proposes two simple yet effective techniques to reduce the computation of transformers for ReID. The first technique is to eliminate the invalid patches that do not contain any person information, thereby reducing the number of tokens fed into the transformer. Considering that computational complexity is quadratic with respect to input tokens, the second technique partitions the image into multiple windows, applies separate transformers to each window, and merges class tokens from each window, which can reduce the complexity of the self-attention mechanism. By combining these two techniques, our proposed method reduces the SOTA baseline model by 12.2% FLOPs, while slightly improving the rank-1 accuracy and only sacrificing 1.1% mAP on DukeMTMC-ReID dataset.