LRMM：基于 RGB-NI-TI 的低等级多尺度多模态融合技术用于人员再识别

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2024-11-12 DOI:10.1016/j.eswa.2024.125716

Di Wu , Zhihui Liu , Zihan Chen , Shenglong Gan , Kaiwen Tan , Qin Wan , Yaonan Wang

{"title":"LRMM：基于 RGB-NI-TI 的低等级多尺度多模态融合技术用于人员再识别","authors":"Di Wu , Zhihui Liu , Zihan Chen , Shenglong Gan , Kaiwen Tan , Qin Wan , Yaonan Wang","doi":"10.1016/j.eswa.2024.125716","DOIUrl":null,"url":null,"abstract":"<div><div>Person Re-identification is a crucial task in video surveillance, aiming to match person images from non-overlapping camera views. Recent methods introduce the Near-Infrared (NI) modality to alleviate the limitations of traditional single visible light modality under low-light conditions, while they overlook the importance of modality-related information. To incorporate more additional complementary information to assist traditional person re-identification tasks, in this paper, a novel RGB-NI-TI multi-modal person re-identification approach is proposed. First, we design a multi-scale multi-modal interaction module to facilitate cross-modal information fusion across multiple scales. Secondly, we propose a low-rank multi-modal fusion module that leverages the feature and weight parallel decomposition and then employs low-rank modality-specific factors for multimodal fusion. It aims to make the model more efficient in fusing multiple modal features while reducing complexity. Finally, we propose a multiple modalities prototype loss to supervise the network jointly with the cross-entropy loss, enforcing the network to learn modality-specific information by improving the intra-class cross-modality similarity and expanding the inter-class difference. The experimental results on benchmark multi-modal Re-ID datasets (RGBNT201, RGBNT100, MSVR310) and constructed person Re-ID datasets (multimodal version Market1501, PRW) validate the effectiveness of the proposed approach compared with the state-of-the-art methods.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"263 ","pages":"Article 125716"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LRMM: Low rank multi-scale multi-modal fusion for person re-identification based on RGB-NI-TI\",\"authors\":\"Di Wu , Zhihui Liu , Zihan Chen , Shenglong Gan , Kaiwen Tan , Qin Wan , Yaonan Wang\",\"doi\":\"10.1016/j.eswa.2024.125716\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Person Re-identification is a crucial task in video surveillance, aiming to match person images from non-overlapping camera views. Recent methods introduce the Near-Infrared (NI) modality to alleviate the limitations of traditional single visible light modality under low-light conditions, while they overlook the importance of modality-related information. To incorporate more additional complementary information to assist traditional person re-identification tasks, in this paper, a novel RGB-NI-TI multi-modal person re-identification approach is proposed. First, we design a multi-scale multi-modal interaction module to facilitate cross-modal information fusion across multiple scales. Secondly, we propose a low-rank multi-modal fusion module that leverages the feature and weight parallel decomposition and then employs low-rank modality-specific factors for multimodal fusion. It aims to make the model more efficient in fusing multiple modal features while reducing complexity. Finally, we propose a multiple modalities prototype loss to supervise the network jointly with the cross-entropy loss, enforcing the network to learn modality-specific information by improving the intra-class cross-modality similarity and expanding the inter-class difference. The experimental results on benchmark multi-modal Re-ID datasets (RGBNT201, RGBNT100, MSVR310) and constructed person Re-ID datasets (multimodal version Market1501, PRW) validate the effectiveness of the proposed approach compared with the state-of-the-art methods.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"263 \",\"pages\":\"Article 125716\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417424025831\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424025831","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

人员再识别是视频监控中的一项重要任务，旨在匹配非重叠摄像机视图中的人员图像。最近的方法引入了近红外（NI）模态，以缓解传统单一可见光模态在弱光条件下的局限性，但这些方法忽略了模态相关信息的重要性。为了加入更多补充信息来辅助传统的人员再识别任务，本文提出了一种新颖的 RGB-NI-TI 多模态人员再识别方法。首先，我们设计了一个多尺度多模态交互模块，以促进跨尺度的跨模态信息融合。其次，我们提出了低阶多模态融合模块，该模块利用特征和权重平行分解，然后采用低阶模态特定因子进行多模态融合。其目的是使模型在融合多模态特征时更加高效，同时降低复杂性。最后，我们提出了一种多模态原型损失，与交叉熵损失共同监督网络，通过提高类内交叉模态相似度和扩大类间差异来强制网络学习特定模态信息。在基准多模态再识别数据集（RGBNT201、RGBNT100、MSVR310）和构建的人物再识别数据集（多模态版本 Market1501、PRW）上的实验结果验证了所提出的方法与最先进方法相比的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

LRMM: Low rank multi-scale multi-modal fusion for person re-identification based on RGB-NI-TI

Person Re-identification is a crucial task in video surveillance, aiming to match person images from non-overlapping camera views. Recent methods introduce the Near-Infrared (NI) modality to alleviate the limitations of traditional single visible light modality under low-light conditions, while they overlook the importance of modality-related information. To incorporate more additional complementary information to assist traditional person re-identification tasks, in this paper, a novel RGB-NI-TI multi-modal person re-identification approach is proposed. First, we design a multi-scale multi-modal interaction module to facilitate cross-modal information fusion across multiple scales. Secondly, we propose a low-rank multi-modal fusion module that leverages the feature and weight parallel decomposition and then employs low-rank modality-specific factors for multimodal fusion. It aims to make the model more efficient in fusing multiple modal features while reducing complexity. Finally, we propose a multiple modalities prototype loss to supervise the network jointly with the cross-entropy loss, enforcing the network to learn modality-specific information by improving the intra-class cross-modality similarity and expanding the inter-class difference. The experimental results on benchmark multi-modal Re-ID datasets (RGBNT201, RGBNT100, MSVR310) and constructed person Re-ID datasets (multimodal version Market1501, PRW) validate the effectiveness of the proposed approach compared with the state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.