基于端到端深度强化学习的PTZ相机自主控制

Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation Pub Date : 2023-04-10 DOI:10.1145/3576842.3582366

S. Sandha, Bharathan Balaji, L. Garcia, Mani Srivastava

{"title":"基于端到端深度强化学习的PTZ相机自主控制","authors":"S. Sandha, Bharathan Balaji, L. Garcia, Mani Srivastava","doi":"10.1145/3576842.3582366","DOIUrl":null,"url":null,"abstract":"Existing approaches for autonomous control of pan-tilt-zoom (PTZ) cameras use multiple stages where object detection and localization are performed separately from the control of the PTZ mechanisms. These approaches require manual labels and suffer from performance bottlenecks due to error propagation across the multi-stage flow of information. The large size of object detection neural networks also makes prior solutions infeasible for real-time deployment in resource-constrained devices. We present an end-to-end deep reinforcement learning (RL) solution called Eagle1 to train a neural network policy that directly takes images as input to control the PTZ camera. Training reinforcement learning is cumbersome in the real world due to labeling effort, runtime environment stochasticity, and fragile experimental setups. We introduce a photo-realistic simulation framework for training and evaluation of PTZ camera control policies. Eagle achieves superior camera control performance by maintaining the object of interest close to the center of captured images at high resolution and has up to 17% more tracking duration than the state-of-the-art. Eagle policies are lightweight (90x fewer parameters than Yolo5s) and can run on embedded camera platforms such as Raspberry PI (33 FPS) and Jetson Nano (38 FPS), facilitating real-time PTZ tracking for resource-constrained environments. With domain randomization, Eagle policies trained in our simulator can be transferred directly to real-world scenarios2.","PeriodicalId":266438,"journal":{"name":"Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation","volume":"129 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Eagle: End-to-end Deep Reinforcement Learning based Autonomous Control of PTZ Cameras\",\"authors\":\"S. Sandha, Bharathan Balaji, L. Garcia, Mani Srivastava\",\"doi\":\"10.1145/3576842.3582366\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Existing approaches for autonomous control of pan-tilt-zoom (PTZ) cameras use multiple stages where object detection and localization are performed separately from the control of the PTZ mechanisms. These approaches require manual labels and suffer from performance bottlenecks due to error propagation across the multi-stage flow of information. The large size of object detection neural networks also makes prior solutions infeasible for real-time deployment in resource-constrained devices. We present an end-to-end deep reinforcement learning (RL) solution called Eagle1 to train a neural network policy that directly takes images as input to control the PTZ camera. Training reinforcement learning is cumbersome in the real world due to labeling effort, runtime environment stochasticity, and fragile experimental setups. We introduce a photo-realistic simulation framework for training and evaluation of PTZ camera control policies. Eagle achieves superior camera control performance by maintaining the object of interest close to the center of captured images at high resolution and has up to 17% more tracking duration than the state-of-the-art. Eagle policies are lightweight (90x fewer parameters than Yolo5s) and can run on embedded camera platforms such as Raspberry PI (33 FPS) and Jetson Nano (38 FPS), facilitating real-time PTZ tracking for resource-constrained environments. With domain randomization, Eagle policies trained in our simulator can be transferred directly to real-world scenarios2.\",\"PeriodicalId\":266438,\"journal\":{\"name\":\"Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation\",\"volume\":\"129 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3576842.3582366\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3576842.3582366","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

现有的平移-倾斜-变焦(PTZ)相机的自主控制方法使用多个阶段，其中目标检测和定位与PTZ机制的控制分开执行。这些方法需要手动标记，并且由于错误在多阶段信息流中传播而受到性能瓶颈的影响。目标检测神经网络的庞大规模也使得之前的解决方案无法在资源受限的设备上进行实时部署。我们提出了一个端到端的深度强化学习(RL)解决方案，称为Eagle1，以训练一个神经网络策略，该策略直接将图像作为输入来控制PTZ相机。由于标注工作、运行环境随机性和脆弱的实验设置，在现实世界中训练强化学习是很麻烦的。我们介绍了一个逼真的模拟框架，用于训练和评估PTZ相机控制策略。Eagle通过将感兴趣的物体保持在高分辨率的拍摄图像中心附近，实现了卓越的相机控制性能，并且比最先进的跟踪时间延长了17%。Eagle策略是轻量级的(比yolo5少90倍的参数)，可以在嵌入式相机平台上运行，如树莓派(33 FPS)和Jetson Nano (38 FPS)，促进对资源受限环境的实时PTZ跟踪。通过领域随机化，在模拟器中训练的Eagle策略可以直接转移到现实场景2。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Eagle: End-to-end Deep Reinforcement Learning based Autonomous Control of PTZ Cameras

Existing approaches for autonomous control of pan-tilt-zoom (PTZ) cameras use multiple stages where object detection and localization are performed separately from the control of the PTZ mechanisms. These approaches require manual labels and suffer from performance bottlenecks due to error propagation across the multi-stage flow of information. The large size of object detection neural networks also makes prior solutions infeasible for real-time deployment in resource-constrained devices. We present an end-to-end deep reinforcement learning (RL) solution called Eagle1 to train a neural network policy that directly takes images as input to control the PTZ camera. Training reinforcement learning is cumbersome in the real world due to labeling effort, runtime environment stochasticity, and fragile experimental setups. We introduce a photo-realistic simulation framework for training and evaluation of PTZ camera control policies. Eagle achieves superior camera control performance by maintaining the object of interest close to the center of captured images at high resolution and has up to 17% more tracking duration than the state-of-the-art. Eagle policies are lightweight (90x fewer parameters than Yolo5s) and can run on embedded camera platforms such as Raspberry PI (33 FPS) and Jetson Nano (38 FPS), facilitating real-time PTZ tracking for resource-constrained environments. With domain randomization, Eagle policies trained in our simulator can be transferred directly to real-world scenarios2.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation

自引率

0.00%

发文量