Temporal Difference Rewards for End-to-end Vision-based Active Robot Tracking using Deep Reinforcement Learning

Pavlos Tiritiris, N. Passalis, A. Tefas
{"title":"Temporal Difference Rewards for End-to-end Vision-based Active Robot Tracking using Deep Reinforcement Learning","authors":"Pavlos Tiritiris, N. Passalis, A. Tefas","doi":"10.1109/ICETCI51973.2021.9574071","DOIUrl":null,"url":null,"abstract":"Object tracking allows for localizing moving objects in sequences of frames providing detailed information regarding the trajectory of objects that appear in a scene. In this paper, we study active object tracking, where a tracker receives an input visual observation and directly outputs the most appropriate control actions in order to follow and keep the target in its field of view, unifying in this way the task of visual tracking and control. This is in contrast with conventional tracking approaches, as typically developed by the computer vision community, where the problem of detecting the tracked object in a frame is decoupled from the problem of controlling the camera and/or the robot to follow the object. Deep Reinforcement Learning (DLR) methods hold the credentials for overcoming these issues, since they allow for tackling both problems, i.e., detecting the tracked object and providing control commands, at the same time. However, DRL algorithms require a significantly different methodology for training compared to traditional computer vision models, e.g., they rely on dynamic simulations for training instead of static datasets, while they are often notoriously difficult to converge, often requiring reward shaping approaches for increasing convergence speed and stability. The main contribution of this paper is a DRL, vision-based active tracking method, along with an appropriately designed reward shaping approach for active tracking problems. The developed methods are evaluated using a state-of-the-art robotics simulator, demonstrating good generalization on various dynamic trajectories of moving objects under a wide range of different setups.","PeriodicalId":281877,"journal":{"name":"2021 International Conference on Emerging Techniques in Computational Intelligence (ICETCI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Emerging Techniques in Computational Intelligence (ICETCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICETCI51973.2021.9574071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Object tracking allows for localizing moving objects in sequences of frames providing detailed information regarding the trajectory of objects that appear in a scene. In this paper, we study active object tracking, where a tracker receives an input visual observation and directly outputs the most appropriate control actions in order to follow and keep the target in its field of view, unifying in this way the task of visual tracking and control. This is in contrast with conventional tracking approaches, as typically developed by the computer vision community, where the problem of detecting the tracked object in a frame is decoupled from the problem of controlling the camera and/or the robot to follow the object. Deep Reinforcement Learning (DLR) methods hold the credentials for overcoming these issues, since they allow for tackling both problems, i.e., detecting the tracked object and providing control commands, at the same time. However, DRL algorithms require a significantly different methodology for training compared to traditional computer vision models, e.g., they rely on dynamic simulations for training instead of static datasets, while they are often notoriously difficult to converge, often requiring reward shaping approaches for increasing convergence speed and stability. The main contribution of this paper is a DRL, vision-based active tracking method, along with an appropriately designed reward shaping approach for active tracking problems. The developed methods are evaluated using a state-of-the-art robotics simulator, demonstrating good generalization on various dynamic trajectories of moving objects under a wide range of different setups.
基于深度强化学习的端到端视觉主动机器人跟踪的时间差奖励
对象跟踪允许在提供关于出现在场景中的对象轨迹的详细信息的帧序列中定位移动对象。在本文中,我们研究的是主动目标跟踪,跟踪器接收输入的视觉观察,并直接输出最合适的控制动作,以跟踪和保持目标在其视野内,从而将视觉跟踪和控制任务统一起来。这与传统的跟踪方法形成对比,通常由计算机视觉社区开发,其中检测帧中跟踪对象的问题与控制相机和/或机器人跟踪对象的问题是分离的。深度强化学习(DLR)方法拥有克服这些问题的凭证,因为它们允许同时解决这两个问题,即检测跟踪对象并提供控制命令。然而,与传统的计算机视觉模型相比,DRL算法需要一种明显不同的训练方法,例如,它们依赖于动态模拟而不是静态数据集进行训练,而它们通常难以收敛,通常需要奖励塑造方法来提高收敛速度和稳定性。本文的主要贡献是基于视觉的主动跟踪方法,以及针对主动跟踪问题适当设计的奖励形成方法。使用最先进的机器人模拟器对开发的方法进行了评估,证明了在各种不同设置下运动物体的各种动态轨迹的良好泛化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信