Learning Spatio-Appearance Memory Network for High-Performance Visual Tracking

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI:10.1109/ICCVW54120.2021.00302

Fei Xie, Wankou Yang, Kaihua Zhang, Bo Liu, Guangting Wang, W. Zuo

{"title":"Learning Spatio-Appearance Memory Network for High-Performance Visual Tracking","authors":"Fei Xie, Wankou Yang, Kaihua Zhang, Bo Liu, Guangting Wang, W. Zuo","doi":"10.1109/ICCVW54120.2021.00302","DOIUrl":null,"url":null,"abstract":"Segmentation-based tracking is currently a promising tracking paradigm due to the robustness towards non-grid deformations, comparing to the traditional box-based tracking methods. However, existing segmentation-based trackers are insufficient in modeling and exploiting dense pixel-wise correspondence across frames. To overcome these limitations, this paper presents a novel segmentation-based tracking architecture equipped with spatio-appearance memory networks. The appearance memory network utilizes spatio-temporal non-local similarity to propagate segmentation mask to the current frame, which can effectively capture long-range appearance variations and we further treat discriminative correlation filter as spatial memory bank to store the mapping between feature map and spatial map. Moreover, mutual promotion on dual memory networks greatly boost the overall tracking performance. We further propose a dynamic memory machine (DMM) which employs the Earth Mover’s Distance (EMD) to reweight memory samples. Without bells and whistles, our simple-yet-effective tracking architecture sets a new state-of-the-art on six tracking benchmarks. Besides, our approach achieves comparable results on two video object segmentation benchmarks. Code and model are released at https://github.com/phiphiphi31/DMB.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"176 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCVW54120.2021.00302","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Segmentation-based tracking is currently a promising tracking paradigm due to the robustness towards non-grid deformations, comparing to the traditional box-based tracking methods. However, existing segmentation-based trackers are insufficient in modeling and exploiting dense pixel-wise correspondence across frames. To overcome these limitations, this paper presents a novel segmentation-based tracking architecture equipped with spatio-appearance memory networks. The appearance memory network utilizes spatio-temporal non-local similarity to propagate segmentation mask to the current frame, which can effectively capture long-range appearance variations and we further treat discriminative correlation filter as spatial memory bank to store the mapping between feature map and spatial map. Moreover, mutual promotion on dual memory networks greatly boost the overall tracking performance. We further propose a dynamic memory machine (DMM) which employs the Earth Mover’s Distance (EMD) to reweight memory samples. Without bells and whistles, our simple-yet-effective tracking architecture sets a new state-of-the-art on six tracking benchmarks. Besides, our approach achieves comparable results on two video object segmentation benchmarks. Code and model are released at https://github.com/phiphiphi31/DMB.

查看原文本刊更多论文

用于高性能视觉跟踪的学习空间-外观记忆网络

与传统的基于框的跟踪方法相比，基于分割的跟踪方法对非网格变形具有鲁棒性，是目前一种很有前途的跟踪范式。然而，现有的基于分割的跟踪器在建模和利用密集的像素级跨帧通信方面是不够的。为了克服这些限制，本文提出了一种新的基于分割的跟踪架构，该架构配备了空间外观记忆网络。外观记忆网络利用时空非局部相似度将分割掩码传播到当前帧，可以有效捕获长时间的外观变化，并进一步将判别相关滤波器作为空间记忆库来存储特征映射与空间映射之间的映射关系。此外，双存储器网络上的相互促进大大提高了整体跟踪性能。我们进一步提出了一种动态记忆机(DMM)，它利用地球移动距离(EMD)对记忆样本进行重加权。没有铃铛和口哨，我们简单而有效的跟踪架构在六个跟踪基准上设置了新的最先进的技术。此外，我们的方法在两个视频对象分割基准上取得了相当的结果。代码和模型发布在https://github.com/phiphiphi31/DMB。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

自引率

0.00%

发文量