{"title":"RXNet:基于双分支网络的跨模态人员再识别","authors":"Weiyang Zhang, Jiong Guo, Qiang Liu, Maoyang Zou, Honggang Chen, Jing Peng","doi":"10.1007/s10489-025-06501-6","DOIUrl":null,"url":null,"abstract":"<div><p>The goal of text-based person re-identification (TI-ReID) is to match individuals using various methods by integrating information from both images and text. TI-ReID encounters significant challenges because of the clear differences in features between images and textual descriptions. Contemporary techniques commonly utilize a method that merges general and specific characteristics to obtain more detailed feature representations. However, these techniques depend on additional models for estimating or segmenting human poses to determine local characteristics, making it challenging to apply them in practice. To solve this problem, we propose a dual-path network based on RegNet and XLNet for TI-ReID (RXNet). In the image segment, RegNet is employed to acquire multitiered semantic image attributes and dynamically assimilate distinct local features through visual focus. In the text segment, XLNet is utilized, to extract significant semantic attributes from the text via a two-way encoding system based on an autoregressive model. Furthermore, to increase the efficacy of our model, we develop both residual triplet attention and dual attention to align features across different modalities. Additionally, we replace cross-entropy ID loss with smoothing ID loss to prevent overfitting while improving the efficiency of the model. Experimental results on the CUHK-PEDES dataset show that the proposed method achieves a rank-1/mAP accuracy of 85.49%/73.40%, outperforming the current state-of-the-art methods by a large margin.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RXNet: cross-modality person re-identification based on a dual-branch network\",\"authors\":\"Weiyang Zhang, Jiong Guo, Qiang Liu, Maoyang Zou, Honggang Chen, Jing Peng\",\"doi\":\"10.1007/s10489-025-06501-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The goal of text-based person re-identification (TI-ReID) is to match individuals using various methods by integrating information from both images and text. TI-ReID encounters significant challenges because of the clear differences in features between images and textual descriptions. Contemporary techniques commonly utilize a method that merges general and specific characteristics to obtain more detailed feature representations. However, these techniques depend on additional models for estimating or segmenting human poses to determine local characteristics, making it challenging to apply them in practice. To solve this problem, we propose a dual-path network based on RegNet and XLNet for TI-ReID (RXNet). In the image segment, RegNet is employed to acquire multitiered semantic image attributes and dynamically assimilate distinct local features through visual focus. In the text segment, XLNet is utilized, to extract significant semantic attributes from the text via a two-way encoding system based on an autoregressive model. Furthermore, to increase the efficacy of our model, we develop both residual triplet attention and dual attention to align features across different modalities. Additionally, we replace cross-entropy ID loss with smoothing ID loss to prevent overfitting while improving the efficiency of the model. Experimental results on the CUHK-PEDES dataset show that the proposed method achieves a rank-1/mAP accuracy of 85.49%/73.40%, outperforming the current state-of-the-art methods by a large margin.</p></div>\",\"PeriodicalId\":8041,\"journal\":{\"name\":\"Applied Intelligence\",\"volume\":\"55 15\",\"pages\":\"\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10489-025-06501-6\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06501-6","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
RXNet: cross-modality person re-identification based on a dual-branch network
The goal of text-based person re-identification (TI-ReID) is to match individuals using various methods by integrating information from both images and text. TI-ReID encounters significant challenges because of the clear differences in features between images and textual descriptions. Contemporary techniques commonly utilize a method that merges general and specific characteristics to obtain more detailed feature representations. However, these techniques depend on additional models for estimating or segmenting human poses to determine local characteristics, making it challenging to apply them in practice. To solve this problem, we propose a dual-path network based on RegNet and XLNet for TI-ReID (RXNet). In the image segment, RegNet is employed to acquire multitiered semantic image attributes and dynamically assimilate distinct local features through visual focus. In the text segment, XLNet is utilized, to extract significant semantic attributes from the text via a two-way encoding system based on an autoregressive model. Furthermore, to increase the efficacy of our model, we develop both residual triplet attention and dual attention to align features across different modalities. Additionally, we replace cross-entropy ID loss with smoothing ID loss to prevent overfitting while improving the efficiency of the model. Experimental results on the CUHK-PEDES dataset show that the proposed method achieves a rank-1/mAP accuracy of 85.49%/73.40%, outperforming the current state-of-the-art methods by a large margin.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.