视频目标跟踪预测决策网络

2020 IEEE International Conference on Image Processing (ICIP) Pub Date : 2020-10-01 DOI:10.1109/ICIP40778.2020.9191145

Yasheng Sun, Tao He, Ying-hong Peng, Jin Qi, Jie Hu

{"title":"视频目标跟踪预测决策网络","authors":"Yasheng Sun, Tao He, Ying-hong Peng, Jin Qi, Jie Hu","doi":"10.1109/ICIP40778.2020.9191145","DOIUrl":null,"url":null,"abstract":"In this paper, we introduce an approach for visual tracking in videos that predicts the bounding box location of a target object at every frame. This tracking problem is formulated as a sequential decision-making process where both historical and current information are taken into account to decide the correct object location. We develop a deep reinforcement learning based strategy, via which the target object position is predicted and decided in a unified framework. Specifically, a RNN based prediction network is developed where local features and global features are fused together to predict object movement. Together with the predicted movement, some predefined possible offsets and detection results form into an action space. A decision network is trained in a reinforcement manner to learn to select the most reasonable tracking box from the action space, through which the target object is tracked at each frame. Experiments in an existing tracking benchmark demonstrate the effectiveness and robustness of our proposed strategy.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"832 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prediction-Decision Network For Video Object Tracking\",\"authors\":\"Yasheng Sun, Tao He, Ying-hong Peng, Jin Qi, Jie Hu\",\"doi\":\"10.1109/ICIP40778.2020.9191145\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we introduce an approach for visual tracking in videos that predicts the bounding box location of a target object at every frame. This tracking problem is formulated as a sequential decision-making process where both historical and current information are taken into account to decide the correct object location. We develop a deep reinforcement learning based strategy, via which the target object position is predicted and decided in a unified framework. Specifically, a RNN based prediction network is developed where local features and global features are fused together to predict object movement. Together with the predicted movement, some predefined possible offsets and detection results form into an action space. A decision network is trained in a reinforcement manner to learn to select the most reasonable tracking box from the action space, through which the target object is tracked at each frame. Experiments in an existing tracking benchmark demonstrate the effectiveness and robustness of our proposed strategy.\",\"PeriodicalId\":405734,\"journal\":{\"name\":\"2020 IEEE International Conference on Image Processing (ICIP)\",\"volume\":\"832 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Image Processing (ICIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIP40778.2020.9191145\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Image Processing (ICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIP40778.2020.9191145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在本文中，我们介绍了一种视频视觉跟踪的方法，该方法在每一帧预测目标物体的边界框位置。这种跟踪问题被表述为一个连续的决策过程，其中考虑了历史和当前信息来确定正确的目标位置。我们开发了一种基于深度强化学习的策略，通过该策略在统一的框架中预测和决定目标物体的位置。具体来说，开发了一种基于RNN的预测网络，将局部特征和全局特征融合在一起来预测物体的运动。与预测的运动一起，一些预定义的可能偏移和检测结果形成一个动作空间。以强化的方式训练决策网络，学习从动作空间中选择最合理的跟踪框，每帧跟踪目标对象。在现有跟踪基准上的实验证明了我们所提出的策略的有效性和鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Prediction-Decision Network For Video Object Tracking

In this paper, we introduce an approach for visual tracking in videos that predicts the bounding box location of a target object at every frame. This tracking problem is formulated as a sequential decision-making process where both historical and current information are taken into account to decide the correct object location. We develop a deep reinforcement learning based strategy, via which the target object position is predicted and decided in a unified framework. Specifically, a RNN based prediction network is developed where local features and global features are fused together to predict object movement. Together with the predicted movement, some predefined possible offsets and detection results form into an action space. A decision network is trained in a reinforcement manner to learn to select the most reasonable tracking box from the action space, through which the target object is tracked at each frame. Experiments in an existing tracking benchmark demonstrate the effectiveness and robustness of our proposed strategy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Conference on Image Processing (ICIP)

自引率

0.00%

发文量