UCT: Learning Unified Convolutional Networks for Real-Time Visual Tracking

2017 IEEE International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2017-11-10 DOI:10.1109/ICCVW.2017.231

Zheng Zhu, Guan Huang, Wei Zou, Dalong Du, Chang Huang

{"title":"UCT: Learning Unified Convolutional Networks for Real-Time Visual Tracking","authors":"Zheng Zhu, Guan Huang, Wei Zou, Dalong Du, Chang Huang","doi":"10.1109/ICCVW.2017.231","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNN) based tracking approaches have shown favorable performance in recent benchmarks. Nonetheless, the chosen CNN features are always pre-trained in different task and individual components in tracking systems are learned separately, thus the achieved tracking performance may be suboptimal. Besides, most of these trackers are not designed towards realtime applications because of their time-consuming feature extraction and complex optimization details. In this paper, we propose an end-to-end framework to learn the convolutional features and perform the tracking process simultaneously, namely, a unified convolutional tracker (UCT). Specifically, The UCT treats feature extractor and tracking process (ridge regression) both as convolution operation and trains them jointly, enabling learned CNN features are tightly coupled to tracking process. In online tracking, an efficient updating method is proposed by introducing peak-versus-noise ratio (PNR) criterion, and scale changes are handled efficiently by incorporating a scale branch into network. The proposed approach results in superior tracking performance, while maintaining real-time speed. The standard UCT and UCT-Lite can track generic objects at 41 FPS and 154 FPS without further optimization, respectively. Experiments are performed on four challenging benchmark tracking datasets: OTB2013, OTB2015, VOT2014 and VOT2015, and our method achieves state-of-the-art results on these benchmarks compared with other real-time trackers.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"77","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCVW.2017.231","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 77

Abstract

Convolutional neural networks (CNN) based tracking approaches have shown favorable performance in recent benchmarks. Nonetheless, the chosen CNN features are always pre-trained in different task and individual components in tracking systems are learned separately, thus the achieved tracking performance may be suboptimal. Besides, most of these trackers are not designed towards realtime applications because of their time-consuming feature extraction and complex optimization details. In this paper, we propose an end-to-end framework to learn the convolutional features and perform the tracking process simultaneously, namely, a unified convolutional tracker (UCT). Specifically, The UCT treats feature extractor and tracking process (ridge regression) both as convolution operation and trains them jointly, enabling learned CNN features are tightly coupled to tracking process. In online tracking, an efficient updating method is proposed by introducing peak-versus-noise ratio (PNR) criterion, and scale changes are handled efficiently by incorporating a scale branch into network. The proposed approach results in superior tracking performance, while maintaining real-time speed. The standard UCT and UCT-Lite can track generic objects at 41 FPS and 154 FPS without further optimization, respectively. Experiments are performed on four challenging benchmark tracking datasets: OTB2013, OTB2015, VOT2014 and VOT2015, and our method achieves state-of-the-art results on these benchmarks compared with other real-time trackers.

查看原文本刊更多论文

UCT:学习统一卷积网络用于实时视觉跟踪

基于卷积神经网络(CNN)的跟踪方法在最近的基准测试中表现出良好的性能。然而，所选择的CNN特征总是在不同的任务中进行预训练，并且跟踪系统中的单个组件是单独学习的，因此所获得的跟踪性能可能不是最优的。此外，这些跟踪器大多不是针对实时应用而设计的，因为它们的特征提取耗时且优化细节复杂。在本文中，我们提出了一个端到端的框架来学习卷积特征并同时执行跟踪过程，即统一卷积跟踪器(UCT)。具体来说，UCT将特征提取器和跟踪过程(脊回归)都视为卷积操作，并联合训练它们，使学习到的CNN特征与跟踪过程紧密耦合。在在线跟踪中，通过引入峰噪比(PNR)准则提出了一种有效的更新方法，并通过在网络中加入尺度分支来有效地处理尺度变化。该方法在保持实时速度的同时，具有良好的跟踪性能。无需进一步优化，标准UCT和UCT- lite可以分别以41 FPS和154 FPS的速度跟踪通用对象。在OTB2013、OTB2015、VOT2014和VOT2015四个具有挑战性的基准跟踪数据集上进行了实验，与其他实时跟踪器相比，我们的方法在这些基准上取得了最先进的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE International Conference on Computer Vision Workshops (ICCVW)

自引率

0.00%

发文量