RXNet: cross-modality person re-identification based on a dual-branch network

IF 3.5 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Intelligence Pub Date : 2025-10-01 DOI:10.1007/s10489-025-06501-6

Weiyang Zhang, Jiong Guo, Qiang Liu, Maoyang Zou, Honggang Chen, Jing Peng

{"title":"RXNet: cross-modality person re-identification based on a dual-branch network","authors":"Weiyang Zhang, Jiong Guo, Qiang Liu, Maoyang Zou, Honggang Chen, Jing Peng","doi":"10.1007/s10489-025-06501-6","DOIUrl":null,"url":null,"abstract":"<div><p>The goal of text-based person re-identification (TI-ReID) is to match individuals using various methods by integrating information from both images and text. TI-ReID encounters significant challenges because of the clear differences in features between images and textual descriptions. Contemporary techniques commonly utilize a method that merges general and specific characteristics to obtain more detailed feature representations. However, these techniques depend on additional models for estimating or segmenting human poses to determine local characteristics, making it challenging to apply them in practice. To solve this problem, we propose a dual-path network based on RegNet and XLNet for TI-ReID (RXNet). In the image segment, RegNet is employed to acquire multitiered semantic image attributes and dynamically assimilate distinct local features through visual focus. In the text segment, XLNet is utilized, to extract significant semantic attributes from the text via a two-way encoding system based on an autoregressive model. Furthermore, to increase the efficacy of our model, we develop both residual triplet attention and dual attention to align features across different modalities. Additionally, we replace cross-entropy ID loss with smoothing ID loss to prevent overfitting while improving the efficiency of the model. Experimental results on the CUHK-PEDES dataset show that the proposed method achieves a rank-1/mAP accuracy of 85.49%/73.40%, outperforming the current state-of-the-art methods by a large margin.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06501-6","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The goal of text-based person re-identification (TI-ReID) is to match individuals using various methods by integrating information from both images and text. TI-ReID encounters significant challenges because of the clear differences in features between images and textual descriptions. Contemporary techniques commonly utilize a method that merges general and specific characteristics to obtain more detailed feature representations. However, these techniques depend on additional models for estimating or segmenting human poses to determine local characteristics, making it challenging to apply them in practice. To solve this problem, we propose a dual-path network based on RegNet and XLNet for TI-ReID (RXNet). In the image segment, RegNet is employed to acquire multitiered semantic image attributes and dynamically assimilate distinct local features through visual focus. In the text segment, XLNet is utilized, to extract significant semantic attributes from the text via a two-way encoding system based on an autoregressive model. Furthermore, to increase the efficacy of our model, we develop both residual triplet attention and dual attention to align features across different modalities. Additionally, we replace cross-entropy ID loss with smoothing ID loss to prevent overfitting while improving the efficiency of the model. Experimental results on the CUHK-PEDES dataset show that the proposed method achieves a rank-1/mAP accuracy of 85.49%/73.40%, outperforming the current state-of-the-art methods by a large margin.

查看原文本刊更多论文

RXNet：基于双分支网络的跨模态人员再识别

基于文本的人再识别（TI-ReID）的目标是通过整合图像和文本信息，使用多种方法对个体进行匹配。由于图像和文本描述在特征上的明显差异，TI-ReID面临着巨大的挑战。当代技术通常使用一种方法，将一般特征和特定特征合并，以获得更详细的特征表示。然而，这些技术依赖于额外的模型来估计或分割人体姿势以确定局部特征，这使得它们在实践中的应用具有挑战性。为了解决这一问题，我们提出了一种基于RegNet和XLNet的TI-ReID双路径网络（RXNet）。在图像段中，利用RegNet获取多层语义图像属性，并通过视觉聚焦动态吸收不同的局部特征。在文本段中，使用XLNet，通过基于自回归模型的双向编码系统从文本中提取重要的语义属性。此外，为了提高模型的有效性，我们开发了剩余三重注意和双重注意，以使不同模式的特征对齐。此外，我们用平滑ID损失代替交叉熵ID损失，在提高模型效率的同时防止过拟合。在中大- pedes数据集上的实验结果表明，该方法的rank-1/mAP准确率为85.49%/73.40%，大大优于目前最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Intelligence 工程技术-计算机：人工智能

CiteScore

6.60

自引率

20.80%

发文量

1361

审稿时长

5.9 months

期刊介绍： With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.