RXNet: cross-modality person re-identification based on a dual-branch network

IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Weiyang Zhang, Jiong Guo, Qiang Liu, Maoyang Zou, Honggang Chen, Jing Peng
{"title":"RXNet: cross-modality person re-identification based on a dual-branch network","authors":"Weiyang Zhang,&nbsp;Jiong Guo,&nbsp;Qiang Liu,&nbsp;Maoyang Zou,&nbsp;Honggang Chen,&nbsp;Jing Peng","doi":"10.1007/s10489-025-06501-6","DOIUrl":null,"url":null,"abstract":"<div><p>The goal of text-based person re-identification (TI-ReID) is to match individuals using various methods by integrating information from both images and text. TI-ReID encounters significant challenges because of the clear differences in features between images and textual descriptions. Contemporary techniques commonly utilize a method that merges general and specific characteristics to obtain more detailed feature representations. However, these techniques depend on additional models for estimating or segmenting human poses to determine local characteristics, making it challenging to apply them in practice. To solve this problem, we propose a dual-path network based on RegNet and XLNet for TI-ReID (RXNet). In the image segment, RegNet is employed to acquire multitiered semantic image attributes and dynamically assimilate distinct local features through visual focus. In the text segment, XLNet is utilized, to extract significant semantic attributes from the text via a two-way encoding system based on an autoregressive model. Furthermore, to increase the efficacy of our model, we develop both residual triplet attention and dual attention to align features across different modalities. Additionally, we replace cross-entropy ID loss with smoothing ID loss to prevent overfitting while improving the efficiency of the model. Experimental results on the CUHK-PEDES dataset show that the proposed method achieves a rank-1/mAP accuracy of 85.49%/73.40%, outperforming the current state-of-the-art methods by a large margin.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06501-6","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The goal of text-based person re-identification (TI-ReID) is to match individuals using various methods by integrating information from both images and text. TI-ReID encounters significant challenges because of the clear differences in features between images and textual descriptions. Contemporary techniques commonly utilize a method that merges general and specific characteristics to obtain more detailed feature representations. However, these techniques depend on additional models for estimating or segmenting human poses to determine local characteristics, making it challenging to apply them in practice. To solve this problem, we propose a dual-path network based on RegNet and XLNet for TI-ReID (RXNet). In the image segment, RegNet is employed to acquire multitiered semantic image attributes and dynamically assimilate distinct local features through visual focus. In the text segment, XLNet is utilized, to extract significant semantic attributes from the text via a two-way encoding system based on an autoregressive model. Furthermore, to increase the efficacy of our model, we develop both residual triplet attention and dual attention to align features across different modalities. Additionally, we replace cross-entropy ID loss with smoothing ID loss to prevent overfitting while improving the efficiency of the model. Experimental results on the CUHK-PEDES dataset show that the proposed method achieves a rank-1/mAP accuracy of 85.49%/73.40%, outperforming the current state-of-the-art methods by a large margin.

RXNet:基于双分支网络的跨模态人员再识别
基于文本的人再识别(TI-ReID)的目标是通过整合图像和文本信息,使用多种方法对个体进行匹配。由于图像和文本描述在特征上的明显差异,TI-ReID面临着巨大的挑战。当代技术通常使用一种方法,将一般特征和特定特征合并,以获得更详细的特征表示。然而,这些技术依赖于额外的模型来估计或分割人体姿势以确定局部特征,这使得它们在实践中的应用具有挑战性。为了解决这一问题,我们提出了一种基于RegNet和XLNet的TI-ReID双路径网络(RXNet)。在图像段中,利用RegNet获取多层语义图像属性,并通过视觉聚焦动态吸收不同的局部特征。在文本段中,使用XLNet,通过基于自回归模型的双向编码系统从文本中提取重要的语义属性。此外,为了提高模型的有效性,我们开发了剩余三重注意和双重注意,以使不同模式的特征对齐。此外,我们用平滑ID损失代替交叉熵ID损失,在提高模型效率的同时防止过拟合。在中大- pedes数据集上的实验结果表明,该方法的rank-1/mAP准确率为85.49%/73.40%,大大优于目前最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Applied Intelligence
Applied Intelligence 工程技术-计算机:人工智能
CiteScore
6.60
自引率
20.80%
发文量
1361
审稿时长
5.9 months
期刊介绍: With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信