Coordinate-aligned multi-camera collaboration for active multi-object tracking

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems Pub Date : 2024-07-29 DOI:10.1007/s00530-024-01420-x

Zeyu Fang, Jian Zhao, Mingyu Yang, Zhenbo Lu, Wengang Zhou, Houqiang Li

{"title":"Coordinate-aligned multi-camera collaboration for active multi-object tracking","authors":"Zeyu Fang, Jian Zhao, Mingyu Yang, Zhenbo Lu, Wengang Zhou, Houqiang Li","doi":"10.1007/s00530-024-01420-x","DOIUrl":null,"url":null,"abstract":"<p>Active Multi-Object Tracking (AMOT) is a task where cameras are controlled by a centralized system to adjust their poses automatically and collaboratively so as to maximize the coverage of targets in their shared visual field. In AMOT, each camera only receives partial information from its observation, which may mislead cameras to take locally optimal action. Besides, the global goal, i.e., maximum coverage of objects, is hard to be directly optimized. To address the above issues, we propose a coordinate-aligned multi-camera collaboration system for AMOT. In our approach, we regard each camera as an agent and address AMOT with a multi-agent reinforcement learning solution. To represent the observation of each agent, we first identify the targets in the camera view with an image detector and then align the coordinates of the targets via inverse projection transformation. We define the reward of each agent based on both global coverage as well as four individual reward terms. The action policy of the agents is derived from a value-based Q-network. To the best of our knowledge, we are the first to study the AMOT task. To train and evaluate the efficacy of our system, we build a virtual yet credible 3D environment, named “Soccer Court”, to mimic the real-world AMOT scenario. The experimental results show that our system outperforms the baseline and existing methods in various settings, including real-world datasets.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"7 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01420-x","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Active Multi-Object Tracking (AMOT) is a task where cameras are controlled by a centralized system to adjust their poses automatically and collaboratively so as to maximize the coverage of targets in their shared visual field. In AMOT, each camera only receives partial information from its observation, which may mislead cameras to take locally optimal action. Besides, the global goal, i.e., maximum coverage of objects, is hard to be directly optimized. To address the above issues, we propose a coordinate-aligned multi-camera collaboration system for AMOT. In our approach, we regard each camera as an agent and address AMOT with a multi-agent reinforcement learning solution. To represent the observation of each agent, we first identify the targets in the camera view with an image detector and then align the coordinates of the targets via inverse projection transformation. We define the reward of each agent based on both global coverage as well as four individual reward terms. The action policy of the agents is derived from a value-based Q-network. To the best of our knowledge, we are the first to study the AMOT task. To train and evaluate the efficacy of our system, we build a virtual yet credible 3D environment, named “Soccer Court”, to mimic the real-world AMOT scenario. The experimental results show that our system outperforms the baseline and existing methods in various settings, including real-world datasets.

Abstract Image

查看原文本刊更多论文

用于主动多目标跟踪的坐标对齐多摄像头协作

主动多目标跟踪（AMOT）是一项由中央系统控制摄像机自动协同调整姿态的任务，以便最大限度地覆盖共享视场中的目标。在 AMOT 中，每台摄像机只能从其观测中获得部分信息，这可能会误导摄像机采取局部最优行动。此外，全局目标（即最大限度地覆盖目标）很难直接优化。针对上述问题，我们提出了一种用于 AMOT 的坐标对齐多摄像机协作系统。在我们的方法中，我们将每台摄像机视为一个代理，并通过多代理强化学习解决方案来解决 AMOT 问题。为了表示每个代理的观察结果，我们首先用图像检测器识别摄像机视图中的目标，然后通过反投影变换对齐目标的坐标。我们根据全局覆盖和四个单项奖励来定义每个代理的奖励。代理的行动策略源自基于价值的 Q 网络。据我们所知，我们是第一个研究 AMOT 任务的人。为了训练和评估我们系统的功效，我们建立了一个虚拟但可信的三维环境，名为 "足球场"，以模拟现实世界中的 AMOT 场景。实验结果表明，在包括真实世界数据集在内的各种环境中，我们的系统都优于基准方法和现有方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Multimedia Systems 工程技术-计算机：理论方法

CiteScore

5.40

自引率

7.70%

发文量

148

审稿时长

4.5 months

期刊介绍： This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.