ARTransformer: An Architecture of Resolution Representation Learning for Cross-Resolution Person Re-Identification

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Concurrency and Computation-Practice & Experience Pub Date : 2025-02-24 DOI:10.1002/cpe.8348

Xing Lu, Fengshan Lai, Zhixiang Cao, Daoxun Xia

{"title":"ARTransformer: An Architecture of Resolution Representation Learning for Cross-Resolution Person Re-Identification","authors":"Xing Lu, Fengshan Lai, Zhixiang Cao, Daoxun Xia","doi":"10.1002/cpe.8348","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Cross-resolution person re-identification (CR-ReID) seeks to overcome the challenge of retrieving and matching specific person images across cameras with varying resolutions. Numerous existing studies utilize established CNNs and ViTs to resize captured low-resolution (LR) images and align them with high-resolution (HR) image features or construct common feature spaces to match between images of different resolutions. However, these methods ignore the potential feature connection between the LR and HR images of the same pedestrian identity. Besides, the CNNs or ViTs usually obtain outliers within the attention maps of LR images; this inclination to excessively concentrate on anomalous information may obscure the genuine and anticipated characteristics between images, which makes it challenging to extract meaningful information from the images. In this work, we propose the abnormal feature elimination and reconfiguration Transformer (ARTransformer), a novel network architecture for robust cross-resolution person re-identification tasks. This method uses a resolution feature discriminator to learn resolution-invariant features and output feature matrices of images with different resolutions. It then calculates the potential feature relationships between images of pedestrians with the same identity but different resolutions through a new cross-resolution landmark agent attention (CR-LAA) mechanism. Conclusively, it utilizes output feature matrices to model LR and HR image interactions by mitigating abnormal image features and prioritizing attention on the target person by learning representations from input images of various resolutions. Experimental results show that ARTransformer performs well in matching images with different resolutions, even with unseen resolution, and extensive evaluations on four real-world datasets confirm the excellent results of our approach.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 4-5","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.8348","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Cross-resolution person re-identification (CR-ReID) seeks to overcome the challenge of retrieving and matching specific person images across cameras with varying resolutions. Numerous existing studies utilize established CNNs and ViTs to resize captured low-resolution (LR) images and align them with high-resolution (HR) image features or construct common feature spaces to match between images of different resolutions. However, these methods ignore the potential feature connection between the LR and HR images of the same pedestrian identity. Besides, the CNNs or ViTs usually obtain outliers within the attention maps of LR images; this inclination to excessively concentrate on anomalous information may obscure the genuine and anticipated characteristics between images, which makes it challenging to extract meaningful information from the images. In this work, we propose the abnormal feature elimination and reconfiguration Transformer (ARTransformer), a novel network architecture for robust cross-resolution person re-identification tasks. This method uses a resolution feature discriminator to learn resolution-invariant features and output feature matrices of images with different resolutions. It then calculates the potential feature relationships between images of pedestrians with the same identity but different resolutions through a new cross-resolution landmark agent attention (CR-LAA) mechanism. Conclusively, it utilizes output feature matrices to model LR and HR image interactions by mitigating abnormal image features and prioritizing attention on the target person by learning representations from input images of various resolutions. Experimental results show that ARTransformer performs well in matching images with different resolutions, even with unseen resolution, and extensive evaluations on four real-world datasets confirm the excellent results of our approach.

查看原文本刊更多论文

ARTransformer：一种跨分辨率人物再识别的分辨率表示学习架构

跨分辨率人物再识别（CR-ReID）旨在克服在不同分辨率的相机上检索和匹配特定人物图像的挑战。许多现有的研究利用已建立的cnn和vit来调整捕获的低分辨率（LR）图像的大小，并将其与高分辨率（HR）图像特征对齐，或者构建公共特征空间来匹配不同分辨率的图像。然而，这些方法忽略了相同行人身份的LR和HR图像之间潜在的特征联系。此外，cnn或ViTs通常在LR图像的注意图中获得异常值；这种过分关注异常信息的倾向可能会模糊图像之间的真实和预期特征，从而使从图像中提取有意义的信息变得困难。在这项工作中，我们提出了异常特征消除和重构变压器（ARTransformer），这是一种新的网络架构，用于鲁棒的跨分辨率人物再识别任务。该方法利用分辨率特征鉴别器学习分辨率不变特征，输出不同分辨率图像的特征矩阵。然后，通过新的跨分辨率地标代理注意（CR-LAA）机制，计算具有相同身份但不同分辨率的行人图像之间的潜在特征关系。最后，它利用输出特征矩阵来模拟LR和HR图像交互，通过从不同分辨率的输入图像中学习表征来减轻异常图像特征并优先关注目标人。实验结果表明，ARTransformer在不同分辨率的图像匹配中表现良好，即使是未见过的分辨率，并且在四个真实数据集上的广泛评估证实了我们的方法的出色效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Concurrency and Computation-Practice & Experience 工程技术-计算机：理论方法

CiteScore

5.00

自引率

10.00%

发文量

664

审稿时长

9.6 months

期刊介绍： Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.