基于重放样本域-模态-混合重构和跨域认知网络的终身可见-红外人再识别

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Vision and Image Understanding Pub Date : 2025-03-01 DOI:10.1016/j.cviu.2025.104328

Xianyu Zhu , Guoqiang Xiao , Michael S. Lew , Song Wu

{"title":"基于重放样本域-模态-混合重构和跨域认知网络的终身可见-红外人再识别","authors":"Xianyu Zhu , Guoqiang Xiao , Michael S. Lew , Song Wu","doi":"10.1016/j.cviu.2025.104328","DOIUrl":null,"url":null,"abstract":"<div><div>Adapting statically-trained models to the incessant influx of data streams poses a pivotal research challenge. Concurrently, visible and infrared person re-identification (VI-ReID) offers an all-day surveillance mode to advance intelligent surveillance and elevate public safety precautions. Hence, we are pioneering a more fine-grained exploration of the lifelong VI-ReID task at the camera level, aiming to imbue the learned models with the capabilities of lifelong learning and memory within the continuous data streams. This task confronts dual challenges of cross-modality and cross-domain variations. Thus, in this paper, we proposed a Domain-Modality-Mix (DMM) based replay samples reconstruction strategy and Cross-domain Cognitive Network (CDCN) to address those challenges. Firstly, we establish an effective and expandable baseline model based on residual neural networks. Secondly, capitalizing on the unexploited potential knowledge of a memory bank that archives diverse replay samples, we enhance the anti-forgetting ability of our model by the Domain-Modality-Mix strategy, which devising a cross-domain, cross-modal image-level replay sample reconstruction, effectively alleviating catastrophic forgetting induced by modality and domain variations. Finally, guided by the Chunking Theory in cognitive psychology, we designed a Cross-domain Cognitive Network, which incorporates a camera-aware, expandable graph convolutional cognitive network to facilitate adaptive learning of intra-modal consistencies and cross-modal similarities within continuous cross-domain data streams. Extensive experiments demonstrate that our proposed method has remarkable adaptability and robust resistance to forgetting and outperforms multiple state-of-the-art methods in comparative assessments of the performance of LVI-ReID. The source code of our designed method is at <span><span>https://github.com/SWU-CS-MediaLab/DMM-CDCN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"254 ","pages":"Article 104328"},"PeriodicalIF":4.3000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Lifelong visible–infrared person re-identification via replay samples domain-modality-mix reconstruction and cross-domain cognitive network\",\"authors\":\"Xianyu Zhu , Guoqiang Xiao , Michael S. Lew , Song Wu\",\"doi\":\"10.1016/j.cviu.2025.104328\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Adapting statically-trained models to the incessant influx of data streams poses a pivotal research challenge. Concurrently, visible and infrared person re-identification (VI-ReID) offers an all-day surveillance mode to advance intelligent surveillance and elevate public safety precautions. Hence, we are pioneering a more fine-grained exploration of the lifelong VI-ReID task at the camera level, aiming to imbue the learned models with the capabilities of lifelong learning and memory within the continuous data streams. This task confronts dual challenges of cross-modality and cross-domain variations. Thus, in this paper, we proposed a Domain-Modality-Mix (DMM) based replay samples reconstruction strategy and Cross-domain Cognitive Network (CDCN) to address those challenges. Firstly, we establish an effective and expandable baseline model based on residual neural networks. Secondly, capitalizing on the unexploited potential knowledge of a memory bank that archives diverse replay samples, we enhance the anti-forgetting ability of our model by the Domain-Modality-Mix strategy, which devising a cross-domain, cross-modal image-level replay sample reconstruction, effectively alleviating catastrophic forgetting induced by modality and domain variations. Finally, guided by the Chunking Theory in cognitive psychology, we designed a Cross-domain Cognitive Network, which incorporates a camera-aware, expandable graph convolutional cognitive network to facilitate adaptive learning of intra-modal consistencies and cross-modal similarities within continuous cross-domain data streams. Extensive experiments demonstrate that our proposed method has remarkable adaptability and robust resistance to forgetting and outperforms multiple state-of-the-art methods in comparative assessments of the performance of LVI-ReID. The source code of our designed method is at <span><span>https://github.com/SWU-CS-MediaLab/DMM-CDCN</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50633,\"journal\":{\"name\":\"Computer Vision and Image Understanding\",\"volume\":\"254 \",\"pages\":\"Article 104328\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision and Image Understanding\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1077314225000517\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225000517","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

使静态训练的模型适应不断涌入的数据流是一个关键的研究挑战。同时，可见光和红外人员再识别（VI-ReID）提供全天监控模式，推进智能监控，提高公共安全防范水平。因此，我们在相机层面上对终身VI-ReID任务进行了更细粒度的探索，旨在为学习模型注入连续数据流中终身学习和记忆的能力。这一任务面临着跨模态和跨领域变化的双重挑战。为此，本文提出了基于域模态混合（domain - modal - mix， DMM）的重放样本重构策略和跨域认知网络（Cross-domain Cognitive Network， CDCN）来解决这些问题。首先，基于残差神经网络建立了有效的可扩展基线模型；其次，利用存储不同重播样本的记忆库中未开发的潜在知识，我们通过domain -modal - mix策略增强了模型的抗遗忘能力，该策略设计了跨域、跨模态的图像级重播样本重建，有效减轻了模态和域变化引起的灾难性遗忘。最后，在认知心理学分块理论的指导下，我们设计了一个跨域认知网络，它包含了一个摄像头感知的、可扩展的图卷积认知网络，以促进连续跨域数据流中模态内一致性和跨模态相似性的自适应学习。大量的实验表明，我们提出的方法具有显著的适应性和强大的抗遗忘能力，并且在LVI-ReID性能的比较评估中优于多种最先进的方法。我们设计的方法的源代码在https://github.com/SWU-CS-MediaLab/DMM-CDCN。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Lifelong visible–infrared person re-identification via replay samples domain-modality-mix reconstruction and cross-domain cognitive network

Adapting statically-trained models to the incessant influx of data streams poses a pivotal research challenge. Concurrently, visible and infrared person re-identification (VI-ReID) offers an all-day surveillance mode to advance intelligent surveillance and elevate public safety precautions. Hence, we are pioneering a more fine-grained exploration of the lifelong VI-ReID task at the camera level, aiming to imbue the learned models with the capabilities of lifelong learning and memory within the continuous data streams. This task confronts dual challenges of cross-modality and cross-domain variations. Thus, in this paper, we proposed a Domain-Modality-Mix (DMM) based replay samples reconstruction strategy and Cross-domain Cognitive Network (CDCN) to address those challenges. Firstly, we establish an effective and expandable baseline model based on residual neural networks. Secondly, capitalizing on the unexploited potential knowledge of a memory bank that archives diverse replay samples, we enhance the anti-forgetting ability of our model by the Domain-Modality-Mix strategy, which devising a cross-domain, cross-modal image-level replay sample reconstruction, effectively alleviating catastrophic forgetting induced by modality and domain variations. Finally, guided by the Chunking Theory in cognitive psychology, we designed a Cross-domain Cognitive Network, which incorporates a camera-aware, expandable graph convolutional cognitive network to facilitate adaptive learning of intra-modal consistencies and cross-modal similarities within continuous cross-domain data streams. Extensive experiments demonstrate that our proposed method has remarkable adaptability and robust resistance to forgetting and outperforms multiple state-of-the-art methods in comparative assessments of the performance of LVI-ReID. The source code of our designed method is at https://github.com/SWU-CS-MediaLab/DMM-CDCN.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems