Vehicle re-identification with large separable kernel attention and hybrid channel attention

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-02-17 DOI:10.1016/j.imavis.2025.105442

Xuezhi Xiang , Zhushan Ma , Xiaoheng Li , Lei Zhang , Xiantong Zhen

{"title":"Vehicle re-identification with large separable kernel attention and hybrid channel attention","authors":"Xuezhi Xiang , Zhushan Ma , Xiaoheng Li , Lei Zhang , Xiantong Zhen","doi":"10.1016/j.imavis.2025.105442","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid development of intelligent transportation systems and the popularity of smart city infrastructure, Vehicle Re-ID technology has become an important research field. The vehicle Re-ID task faces an important challenge, which is the high similarity between different vehicles. Existing methods use additional detection or segmentation models to extract differentiated local features. However, these methods either rely on additional annotations or greatly increase the computational cost. Using attention mechanism to capture global and local features is crucial to solve the challenge of high similarity between classes in vehicle Re-ID tasks. In this paper, we propose LSKA-ReID with large separable kernel attention and hybrid channel attention. Specifically, the large separable kernel attention (LSKA) utilizes the advantages of self-attention and also benefits from the advantages of convolution, which can extract the global and local features of the vehicle more comprehensively. We also compare the performance of LSKA and large kernel attention (LKA) on the vehicle ReID task. We also introduce hybrid channel attention (HCA), which combines channel attention with spatial information, so that the model can better focus on channels and feature regions, and ignore background and other disturbing information. Extensive experiments on three popular datasets VeRi-776, VehicleID and VERI-Wild demonstrate the effectiveness of LSKA-ReID. In particular, on VeRi-776 dataset, mAP reaches 86.78% and Rank-1 reaches 98.09%.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"155 ","pages":"Article 105442"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625000307","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

With the rapid development of intelligent transportation systems and the popularity of smart city infrastructure, Vehicle Re-ID technology has become an important research field. The vehicle Re-ID task faces an important challenge, which is the high similarity between different vehicles. Existing methods use additional detection or segmentation models to extract differentiated local features. However, these methods either rely on additional annotations or greatly increase the computational cost. Using attention mechanism to capture global and local features is crucial to solve the challenge of high similarity between classes in vehicle Re-ID tasks. In this paper, we propose LSKA-ReID with large separable kernel attention and hybrid channel attention. Specifically, the large separable kernel attention (LSKA) utilizes the advantages of self-attention and also benefits from the advantages of convolution, which can extract the global and local features of the vehicle more comprehensively. We also compare the performance of LSKA and large kernel attention (LKA) on the vehicle ReID task. We also introduce hybrid channel attention (HCA), which combines channel attention with spatial information, so that the model can better focus on channels and feature regions, and ignore background and other disturbing information. Extensive experiments on three popular datasets VeRi-776, VehicleID and VERI-Wild demonstrate the effectiveness of LSKA-ReID. In particular, on VeRi-776 dataset, mAP reaches 86.78% and Rank-1 reaches 98.09%.

查看原文本刊更多论文

基于大可分离核注意和混合通道注意的车辆再识别

随着智能交通系统的快速发展和智慧城市基础设施的普及，车辆再识别技术已成为一个重要的研究领域。车辆重新识别任务面临着一个重要的挑战，即不同车辆之间的高度相似性。现有的方法使用额外的检测或分割模型来提取差异化的局部特征。然而，这些方法要么依赖于额外的注释，要么大大增加了计算成本。利用注意力机制捕获全局和局部特征是解决车辆Re-ID任务中类别之间高度相似的关键。本文提出了具有大可分离核注意和混合通道注意的LSKA-ReID算法。其中，大可分离核注意（large separated kernel attention， LSKA）既利用了自注意的优点，又利用了卷积的优点，可以更全面地提取车辆的全局和局部特征。我们还比较了LSKA和大核关注（LKA）在车辆ReID任务上的性能。我们还引入了混合通道注意（HCA），将通道注意与空间信息相结合，使模型能够更好地关注通道和特征区域，而忽略背景等干扰信息。在VeRi-776、VehicleID和VERI-Wild三个流行数据集上的大量实验证明了LSKA-ReID的有效性。特别是在VeRi-776数据集上，mAP达到86.78%，Rank-1达到98.09%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.