物体再识别转换器：一项调查

IF 11.6 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Computer Vision Pub Date : 2024-11-23 DOI:10.1007/s11263-024-02284-4

Mang Ye, Shuoyi Chen, Chenyue Li, Wei-Shi Zheng, David Crandall, Bo Du

{"title":"物体再识别转换器：一项调查","authors":"Mang Ye, Shuoyi Chen, Chenyue Li, Wei-Shi Zheng, David Crandall, Bo Du","doi":"10.1007/s11263-024-02284-4","DOIUrl":null,"url":null,"abstract":"<p>Object Re-identification (Re-ID) aims to identify specific objects across different times and scenes, which is a widely researched task in computer vision. For a prolonged period, this field has been predominantly driven by deep learning technology based on convolutional neural networks. In recent years, the emergence of Vision Transformers has spurred a growing number of studies delving deeper into Transformer-based Re-ID, continuously breaking performance records and witnessing significant progress in the Re-ID field. Offering a powerful, flexible, and unified solution, Transformers cater to a wide array of Re-ID tasks with unparalleled efficacy. This paper provides a comprehensive review and in-depth analysis of the Transformer-based Re-ID. In categorizing existing works into Image/Video-Based Re-ID, Re-ID with limited data/annotations, Cross-Modal Re-ID, and Special Re-ID Scenarios, we thoroughly elucidate the advantages demonstrated by the Transformer in addressing a multitude of challenges across these domains. Considering the trending unsupervised Re-ID, we propose a new Transformer baseline, UntransReID, achieving state-of-the-art performance on both single/cross modal tasks. For the under-explored animal Re-ID, we devise a standardized experimental benchmark and conduct extensive experiments to explore the applicability of Transformer for this task and facilitate future research. Finally, we discuss some important yet under-investigated open issues in the large foundation model era, we believe it will serve as a new handbook for researchers in this field. A periodically updated website will be available at https://github.com/mangye16/ReID-Survey.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"15 1","pages":""},"PeriodicalIF":11.6000,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Transformer for Object Re-identification: A Survey\",\"authors\":\"Mang Ye, Shuoyi Chen, Chenyue Li, Wei-Shi Zheng, David Crandall, Bo Du\",\"doi\":\"10.1007/s11263-024-02284-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Object Re-identification (Re-ID) aims to identify specific objects across different times and scenes, which is a widely researched task in computer vision. For a prolonged period, this field has been predominantly driven by deep learning technology based on convolutional neural networks. In recent years, the emergence of Vision Transformers has spurred a growing number of studies delving deeper into Transformer-based Re-ID, continuously breaking performance records and witnessing significant progress in the Re-ID field. Offering a powerful, flexible, and unified solution, Transformers cater to a wide array of Re-ID tasks with unparalleled efficacy. This paper provides a comprehensive review and in-depth analysis of the Transformer-based Re-ID. In categorizing existing works into Image/Video-Based Re-ID, Re-ID with limited data/annotations, Cross-Modal Re-ID, and Special Re-ID Scenarios, we thoroughly elucidate the advantages demonstrated by the Transformer in addressing a multitude of challenges across these domains. Considering the trending unsupervised Re-ID, we propose a new Transformer baseline, UntransReID, achieving state-of-the-art performance on both single/cross modal tasks. For the under-explored animal Re-ID, we devise a standardized experimental benchmark and conduct extensive experiments to explore the applicability of Transformer for this task and facilitate future research. Finally, we discuss some important yet under-investigated open issues in the large foundation model era, we believe it will serve as a new handbook for researchers in this field. A periodically updated website will be available at https://github.com/mangye16/ReID-Survey.</p>\",\"PeriodicalId\":13752,\"journal\":{\"name\":\"International Journal of Computer Vision\",\"volume\":\"15 1\",\"pages\":\"\"},\"PeriodicalIF\":11.6000,\"publicationDate\":\"2024-11-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computer Vision\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11263-024-02284-4\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11263-024-02284-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

物体再识别（Re-ID）旨在识别不同时间和不同场景中的特定物体，是计算机视觉领域一项被广泛研究的任务。长期以来，这一领域主要由基于卷积神经网络的深度学习技术驱动。近年来，视觉变形器的出现促使越来越多的研究深入探讨基于变形器的再识别（Re-ID）技术，不断刷新性能记录，见证了再识别领域的重大进展。变压器提供了一个功能强大、灵活统一的解决方案，可以满足各种重新识别任务的需要，具有无与伦比的功效。本文对基于变形金刚的重新识别技术进行了全面回顾和深入分析。我们将现有作品分为基于图像/视频的再识别、使用有限数据/注释的再识别、跨模态再识别和特殊再识别场景，深入阐明了变形金刚在应对这些领域的众多挑战时所展现出的优势。考虑到无监督再识别技术的发展趋势，我们提出了一种新的 Transformer 基线--UntransReID，在单模态/跨模态任务中均实现了最先进的性能。对于探索不足的动物再识别，我们设计了一个标准化的实验基准，并进行了广泛的实验，以探索 Transformer 在该任务中的适用性，并促进未来的研究。最后，我们讨论了大型基础模型时代一些重要但尚未得到充分研究的开放性问题，相信这将成为该领域研究人员的新手册。定期更新的网站：https://github.com/mangye16/ReID-Survey。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Transformer for Object Re-identification: A Survey

Object Re-identification (Re-ID) aims to identify specific objects across different times and scenes, which is a widely researched task in computer vision. For a prolonged period, this field has been predominantly driven by deep learning technology based on convolutional neural networks. In recent years, the emergence of Vision Transformers has spurred a growing number of studies delving deeper into Transformer-based Re-ID, continuously breaking performance records and witnessing significant progress in the Re-ID field. Offering a powerful, flexible, and unified solution, Transformers cater to a wide array of Re-ID tasks with unparalleled efficacy. This paper provides a comprehensive review and in-depth analysis of the Transformer-based Re-ID. In categorizing existing works into Image/Video-Based Re-ID, Re-ID with limited data/annotations, Cross-Modal Re-ID, and Special Re-ID Scenarios, we thoroughly elucidate the advantages demonstrated by the Transformer in addressing a multitude of challenges across these domains. Considering the trending unsupervised Re-ID, we propose a new Transformer baseline, UntransReID, achieving state-of-the-art performance on both single/cross modal tasks. For the under-explored animal Re-ID, we devise a standardized experimental benchmark and conduct extensive experiments to explore the applicability of Transformer for this task and facilitate future research. Finally, we discuss some important yet under-investigated open issues in the large foundation model era, we believe it will serve as a new handbook for researchers in this field. A periodically updated website will be available at https://github.com/mangye16/ReID-Survey.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Computer Vision 工程技术-计算机：人工智能

CiteScore

29.80

自引率

2.10%

发文量

163

审稿时长

6 months

期刊介绍： The International Journal of Computer Vision (IJCV) serves as a platform for sharing new research findings in the rapidly growing field of computer vision. It publishes 12 issues annually and presents high-quality, original contributions to the science and engineering of computer vision. The journal encompasses various types of articles to cater to different research outputs. Regular articles, which span up to 25 journal pages, focus on significant technical advancements that are of broad interest to the field. These articles showcase substantial progress in computer vision. Short articles, limited to 10 pages, offer a swift publication path for novel research outcomes. They provide a quicker means for sharing new findings with the computer vision community. Survey articles, comprising up to 30 pages, offer critical evaluations of the current state of the art in computer vision or offer tutorial presentations of relevant topics. These articles provide comprehensive and insightful overviews of specific subject areas. In addition to technical articles, the journal also includes book reviews, position papers, and editorials by prominent scientific figures. These contributions serve to complement the technical content and provide valuable perspectives. The journal encourages authors to include supplementary material online, such as images, video sequences, data sets, and software. This additional material enhances the understanding and reproducibility of the published research. Overall, the International Journal of Computer Vision is a comprehensive publication that caters to researchers in this rapidly growing field. It covers a range of article types, offers additional online resources, and facilitates the dissemination of impactful research.