Xianyu Zhu , Guoqiang Xiao , Michael S. Lew , Song Wu
{"title":"Lifelong visible–infrared person re-identification via replay samples domain-modality-mix reconstruction and cross-domain cognitive network","authors":"Xianyu Zhu , Guoqiang Xiao , Michael S. Lew , Song Wu","doi":"10.1016/j.cviu.2025.104328","DOIUrl":null,"url":null,"abstract":"<div><div>Adapting statically-trained models to the incessant influx of data streams poses a pivotal research challenge. Concurrently, visible and infrared person re-identification (VI-ReID) offers an all-day surveillance mode to advance intelligent surveillance and elevate public safety precautions. Hence, we are pioneering a more fine-grained exploration of the lifelong VI-ReID task at the camera level, aiming to imbue the learned models with the capabilities of lifelong learning and memory within the continuous data streams. This task confronts dual challenges of cross-modality and cross-domain variations. Thus, in this paper, we proposed a Domain-Modality-Mix (DMM) based replay samples reconstruction strategy and Cross-domain Cognitive Network (CDCN) to address those challenges. Firstly, we establish an effective and expandable baseline model based on residual neural networks. Secondly, capitalizing on the unexploited potential knowledge of a memory bank that archives diverse replay samples, we enhance the anti-forgetting ability of our model by the Domain-Modality-Mix strategy, which devising a cross-domain, cross-modal image-level replay sample reconstruction, effectively alleviating catastrophic forgetting induced by modality and domain variations. Finally, guided by the Chunking Theory in cognitive psychology, we designed a Cross-domain Cognitive Network, which incorporates a camera-aware, expandable graph convolutional cognitive network to facilitate adaptive learning of intra-modal consistencies and cross-modal similarities within continuous cross-domain data streams. Extensive experiments demonstrate that our proposed method has remarkable adaptability and robust resistance to forgetting and outperforms multiple state-of-the-art methods in comparative assessments of the performance of LVI-ReID. The source code of our designed method is at <span><span>https://github.com/SWU-CS-MediaLab/DMM-CDCN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"254 ","pages":"Article 104328"},"PeriodicalIF":4.3000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225000517","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Adapting statically-trained models to the incessant influx of data streams poses a pivotal research challenge. Concurrently, visible and infrared person re-identification (VI-ReID) offers an all-day surveillance mode to advance intelligent surveillance and elevate public safety precautions. Hence, we are pioneering a more fine-grained exploration of the lifelong VI-ReID task at the camera level, aiming to imbue the learned models with the capabilities of lifelong learning and memory within the continuous data streams. This task confronts dual challenges of cross-modality and cross-domain variations. Thus, in this paper, we proposed a Domain-Modality-Mix (DMM) based replay samples reconstruction strategy and Cross-domain Cognitive Network (CDCN) to address those challenges. Firstly, we establish an effective and expandable baseline model based on residual neural networks. Secondly, capitalizing on the unexploited potential knowledge of a memory bank that archives diverse replay samples, we enhance the anti-forgetting ability of our model by the Domain-Modality-Mix strategy, which devising a cross-domain, cross-modal image-level replay sample reconstruction, effectively alleviating catastrophic forgetting induced by modality and domain variations. Finally, guided by the Chunking Theory in cognitive psychology, we designed a Cross-domain Cognitive Network, which incorporates a camera-aware, expandable graph convolutional cognitive network to facilitate adaptive learning of intra-modal consistencies and cross-modal similarities within continuous cross-domain data streams. Extensive experiments demonstrate that our proposed method has remarkable adaptability and robust resistance to forgetting and outperforms multiple state-of-the-art methods in comparative assessments of the performance of LVI-ReID. The source code of our designed method is at https://github.com/SWU-CS-MediaLab/DMM-CDCN.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems