SiamS3C：利用中心向导回归进行视觉跟踪的空间通道交叉相关技术

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems Pub Date : 2024-08-20 DOI:10.1007/s00530-024-01450-5

Jianming Zhang, Wentao Chen, Yufan He, Li-Dan Kuang, Arun Kumar Sangaiah

{"title":"SiamS3C：利用中心向导回归进行视觉跟踪的空间通道交叉相关技术","authors":"Jianming Zhang, Wentao Chen, Yufan He, Li-Dan Kuang, Arun Kumar Sangaiah","doi":"10.1007/s00530-024-01450-5","DOIUrl":null,"url":null,"abstract":"<p>Visual object tracking can be divided into the object classification and bounding-box regression tasks, but only one sharing correlation map leads to inaccuracy. Siamese trackers compute correlation map by cross-correlation operation with high computational cost, and this operation performed either on channels or in spatial domain results in weak perception of the global information. In addition, some Siamese trackers with a centerness branch ignore the associations between the centerness branch and the bounding-box regression branch. To alleviate these problems, we propose a visual object tracker based on Spatial-Channel Cross-Correlation and Centerness-Guided Regression. Firstly, we propose a spatial-channel cross-correlation module (SC3M) that combines the search region feature and the template feature both on channels and in spatial domain, which suppresses the interference of distractors. As a lightweight module, SC3M can compute dual independent correlation maps inputted to different subnetworks. Secondly, we propose a centerness-guided regression subnetwork consisting of the centerness branch and the bounding-box regression branch. The centerness guides the whole regression subnetwork to enhance the association of two branches and further suppress the low-quality predicted bounding boxes. Thirdly, we have conducted extensive experiments on five challenging benchmarks, including GOT-10k, VOT2018, TrackingNet, OTB100 and UAV123. The results show the excellent performance of our tracker and our tracker achieves real-time requirement at 48.52 fps.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"154 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SiamS3C: spatial-channel cross-correlation for visual tracking with centerness-guided regression\",\"authors\":\"Jianming Zhang, Wentao Chen, Yufan He, Li-Dan Kuang, Arun Kumar Sangaiah\",\"doi\":\"10.1007/s00530-024-01450-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Visual object tracking can be divided into the object classification and bounding-box regression tasks, but only one sharing correlation map leads to inaccuracy. Siamese trackers compute correlation map by cross-correlation operation with high computational cost, and this operation performed either on channels or in spatial domain results in weak perception of the global information. In addition, some Siamese trackers with a centerness branch ignore the associations between the centerness branch and the bounding-box regression branch. To alleviate these problems, we propose a visual object tracker based on Spatial-Channel Cross-Correlation and Centerness-Guided Regression. Firstly, we propose a spatial-channel cross-correlation module (SC3M) that combines the search region feature and the template feature both on channels and in spatial domain, which suppresses the interference of distractors. As a lightweight module, SC3M can compute dual independent correlation maps inputted to different subnetworks. Secondly, we propose a centerness-guided regression subnetwork consisting of the centerness branch and the bounding-box regression branch. The centerness guides the whole regression subnetwork to enhance the association of two branches and further suppress the low-quality predicted bounding boxes. Thirdly, we have conducted extensive experiments on five challenging benchmarks, including GOT-10k, VOT2018, TrackingNet, OTB100 and UAV123. The results show the excellent performance of our tracker and our tracker achieves real-time requirement at 48.52 fps.</p>\",\"PeriodicalId\":51138,\"journal\":{\"name\":\"Multimedia Systems\",\"volume\":\"154 1\",\"pages\":\"\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Multimedia Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s00530-024-01450-5\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01450-5","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

视觉物体跟踪可分为物体分类和边界框回归任务，但只共享一个相关图会导致不准确。连体跟踪器通过交叉相关运算来计算相关图，计算成本较高，而且这种运算要么在通道上进行，要么在空间域进行，导致对全局信息的感知较弱。此外，一些带有中心性分支的连体跟踪器会忽略中心性分支与边界框回归分支之间的关联。为了解决这些问题，我们提出了一种基于空间通道交叉相关和中心性引导回归的视觉物体跟踪器。首先，我们提出了一个空间通道交叉相关模块（SC3M），它在通道和空间域上结合了搜索区域特征和模板特征，从而抑制了干扰因素的干扰。作为一个轻量级模块，SC3M 可以计算输入到不同子网络的双独立相关图。其次，我们提出了由中心性分支和边界框回归分支组成的中心性引导回归子网络。中心性引导整个回归子网络，以增强两个分支的关联性，并进一步抑制低质量的边界框预测。第三，我们在五个具有挑战性的基准上进行了广泛的实验，包括 GOT-10k、VOT2018、TrackingNet、OTB100 和 UAV123。结果表明，我们的跟踪器性能卓越，达到了 48.52 fps 的实时要求。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

SiamS3C: spatial-channel cross-correlation for visual tracking with centerness-guided regression

查看原文本刊更多论文

SiamS3C: spatial-channel cross-correlation for visual tracking with centerness-guided regression

Visual object tracking can be divided into the object classification and bounding-box regression tasks, but only one sharing correlation map leads to inaccuracy. Siamese trackers compute correlation map by cross-correlation operation with high computational cost, and this operation performed either on channels or in spatial domain results in weak perception of the global information. In addition, some Siamese trackers with a centerness branch ignore the associations between the centerness branch and the bounding-box regression branch. To alleviate these problems, we propose a visual object tracker based on Spatial-Channel Cross-Correlation and Centerness-Guided Regression. Firstly, we propose a spatial-channel cross-correlation module (SC3M) that combines the search region feature and the template feature both on channels and in spatial domain, which suppresses the interference of distractors. As a lightweight module, SC3M can compute dual independent correlation maps inputted to different subnetworks. Secondly, we propose a centerness-guided regression subnetwork consisting of the centerness branch and the bounding-box regression branch. The centerness guides the whole regression subnetwork to enhance the association of two branches and further suppress the low-quality predicted bounding boxes. Thirdly, we have conducted extensive experiments on five challenging benchmarks, including GOT-10k, VOT2018, TrackingNet, OTB100 and UAV123. The results show the excellent performance of our tracker and our tracker achieves real-time requirement at 48.52 fps.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Multimedia Systems 工程技术-计算机：理论方法

CiteScore

5.40

自引率

7.70%

发文量

148

审稿时长

4.5 months

期刊介绍： This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.