统一的多视角多人跟踪框架

IF 18.3 3区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computational Visual Media Pub Date : 2023-11-30 DOI:10.1007/s41095-023-0334-8

Fan Yang, Shigeyuki Odashima, Sosuke Yamao, Hiroaki Fujimoto, Shoichi Masui, Shan Jiang

{"title":"统一的多视角多人跟踪框架","authors":"Fan Yang, Shigeyuki Odashima, Sosuke Yamao, Hiroaki Fujimoto, Shoichi Masui, Shan Jiang","doi":"10.1007/s41095-023-0334-8","DOIUrl":null,"url":null,"abstract":"Despite significant developments in 3D multi-view multi-person (3D MM) tracking, current frameworks separately target footprint tracking, or pose tracking. Frameworks designed for the former cannot be used for the latter, because they directly obtain 3D positions on the ground plane via a homography projection, which is inapplicable to 3D poses above the ground. In contrast, frameworks designed for pose tracking generally isolate multi-view and multi-frame associations and may not be sufficiently robust for footprint tracking, which utilizes fewer key points than pose tracking, weakening multi-view association cues in a single frame. This study presents a unified multi-view multi-person tracking framework to bridge the gap between footprint tracking and pose tracking. Without additional modifications, the framework can adopt monocular 2D bounding boxes and 2D poses as its input to produce robust 3D trajectories for multiple persons. Importantly, multi-frame and multi-view information are jointly employed to improve association and triangulation. Our framework is shown to provide state-of-the-art performance on the Campus and Shelf datasets for 3D pose tracking, with comparable results on the WILDTRACK and MMPTRACK datasets for 3D footprint tracking.\n","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"4 1","pages":""},"PeriodicalIF":18.3000,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A unified multi-view multi-person tracking framework\",\"authors\":\"Fan Yang, Shigeyuki Odashima, Sosuke Yamao, Hiroaki Fujimoto, Shoichi Masui, Shan Jiang\",\"doi\":\"10.1007/s41095-023-0334-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite significant developments in 3D multi-view multi-person (3D MM) tracking, current frameworks separately target footprint tracking, or pose tracking. Frameworks designed for the former cannot be used for the latter, because they directly obtain 3D positions on the ground plane via a homography projection, which is inapplicable to 3D poses above the ground. In contrast, frameworks designed for pose tracking generally isolate multi-view and multi-frame associations and may not be sufficiently robust for footprint tracking, which utilizes fewer key points than pose tracking, weakening multi-view association cues in a single frame. This study presents a unified multi-view multi-person tracking framework to bridge the gap between footprint tracking and pose tracking. Without additional modifications, the framework can adopt monocular 2D bounding boxes and 2D poses as its input to produce robust 3D trajectories for multiple persons. Importantly, multi-frame and multi-view information are jointly employed to improve association and triangulation. Our framework is shown to provide state-of-the-art performance on the Campus and Shelf datasets for 3D pose tracking, with comparable results on the WILDTRACK and MMPTRACK datasets for 3D footprint tracking.\\n\",\"PeriodicalId\":37301,\"journal\":{\"name\":\"Computational Visual Media\",\"volume\":\"4 1\",\"pages\":\"\"},\"PeriodicalIF\":18.3000,\"publicationDate\":\"2023-11-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Visual Media\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s41095-023-0334-8\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Visual Media","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s41095-023-0334-8","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

尽管在三维多视角多人（3D MM）跟踪方面取得了重大进展，但目前的框架仍分别针对足迹跟踪或姿势跟踪。为前者设计的框架不能用于后者，因为它们直接通过同构投影获得地平面上的三维位置，而这不适用于地面上的三维姿势。与此相反，为姿势跟踪设计的框架一般会隔离多视角和多帧关联，对于足迹跟踪可能不够稳健，因为足迹跟踪使用的关键点比姿势跟踪少，削弱了单帧中的多视角关联线索。本研究提出了一个统一的多视角多人跟踪框架，以弥补足迹跟踪和姿势跟踪之间的差距。无需额外修改，该框架可采用单目二维边界框和二维姿势作为输入，为多人生成稳健的三维轨迹。重要的是，多帧和多视角信息被联合用于改进关联和三角测量。研究表明，我们的框架在 Campus 和 Shelf 数据集的三维姿态跟踪方面具有最先进的性能，在 WILDTRACK 和 MMPTRACK 数据集的三维足迹跟踪方面也取得了不相上下的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

A unified multi-view multi-person tracking framework

查看原文本刊更多论文

A unified multi-view multi-person tracking framework

Despite significant developments in 3D multi-view multi-person (3D MM) tracking, current frameworks separately target footprint tracking, or pose tracking. Frameworks designed for the former cannot be used for the latter, because they directly obtain 3D positions on the ground plane via a homography projection, which is inapplicable to 3D poses above the ground. In contrast, frameworks designed for pose tracking generally isolate multi-view and multi-frame associations and may not be sufficiently robust for footprint tracking, which utilizes fewer key points than pose tracking, weakening multi-view association cues in a single frame. This study presents a unified multi-view multi-person tracking framework to bridge the gap between footprint tracking and pose tracking. Without additional modifications, the framework can adopt monocular 2D bounding boxes and 2D poses as its input to produce robust 3D trajectories for multiple persons. Importantly, multi-frame and multi-view information are jointly employed to improve association and triangulation. Our framework is shown to provide state-of-the-art performance on the Campus and Shelf datasets for 3D pose tracking, with comparable results on the WILDTRACK and MMPTRACK datasets for 3D footprint tracking.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational Visual Media Computer Science-Computer Graphics and Computer-Aided Design

CiteScore

16.90

自引率

5.80%

发文量

243

审稿时长

6 weeks

期刊介绍： Computational Visual Media is a peer-reviewed open access journal. It publishes original high-quality research papers and significant review articles on novel ideas, methods, and systems relevant to visual media. Computational Visual Media publishes articles that focus on, but are not limited to, the following areas: • Editing and composition of visual media • Geometric computing for images and video • Geometry modeling and processing • Machine learning for visual media • Physically based animation • Realistic rendering • Recognition and understanding of visual media • Visual computing for robotics • Visualization and visual analytics Other interdisciplinary research into visual media that combines aspects of computer graphics, computer vision, image and video processing, geometric computing, and machine learning is also within the journal''s scope. This is an open access journal, published quarterly by Tsinghua University Press and Springer. The open access fees (article-processing charges) are fully sponsored by Tsinghua University, China. Authors can publish in the journal without any additional charges.