RockTrack:3D Robust Multi-Camera-Ken 多目标跟踪框架

Xiaoyu Li, Peidong Li, Lijun Zhao, Dedong Liu, Jinghan Gao, Xian Wu, Yitao Wu, Dixiao Cui
{"title":"RockTrack:3D Robust Multi-Camera-Ken 多目标跟踪框架","authors":"Xiaoyu Li, Peidong Li, Lijun Zhao, Dedong Liu, Jinghan Gao, Xian Wu, Yitao Wu, Dixiao Cui","doi":"arxiv-2409.11749","DOIUrl":null,"url":null,"abstract":"3D Multi-Object Tracking (MOT) obtains significant performance improvements\nwith the rapid advancements in 3D object detection, particularly in\ncost-effective multi-camera setups. However, the prevalent end-to-end training\napproach for multi-camera trackers results in detector-specific models,\nlimiting their versatility. Moreover, current generic trackers overlook the\nunique features of multi-camera detectors, i.e., the unreliability of motion\nobservations and the feasibility of visual information. To address these\nchallenges, we propose RockTrack, a 3D MOT method for multi-camera detectors.\nFollowing the Tracking-By-Detection framework, RockTrack is compatible with\nvarious off-the-shelf detectors. RockTrack incorporates a confidence-guided\npreprocessing module to extract reliable motion and image observations from\ndistinct representation spaces from a single detector. These observations are\nthen fused in an association module that leverages geometric and appearance\ncues to minimize mismatches. The resulting matches are propagated through a\nstaged estimation process, forming the basis for heuristic noise modeling.\nAdditionally, we introduce a novel appearance similarity metric for explicitly\ncharacterizing object affinities in multi-camera settings. RockTrack achieves\nstate-of-the-art performance on the nuScenes vision-only tracking leaderboard\nwith 59.1% AMOTA while demonstrating impressive computational efficiency.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RockTrack: A 3D Robust Multi-Camera-Ken Multi-Object Tracking Framework\",\"authors\":\"Xiaoyu Li, Peidong Li, Lijun Zhao, Dedong Liu, Jinghan Gao, Xian Wu, Yitao Wu, Dixiao Cui\",\"doi\":\"arxiv-2409.11749\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"3D Multi-Object Tracking (MOT) obtains significant performance improvements\\nwith the rapid advancements in 3D object detection, particularly in\\ncost-effective multi-camera setups. However, the prevalent end-to-end training\\napproach for multi-camera trackers results in detector-specific models,\\nlimiting their versatility. Moreover, current generic trackers overlook the\\nunique features of multi-camera detectors, i.e., the unreliability of motion\\nobservations and the feasibility of visual information. To address these\\nchallenges, we propose RockTrack, a 3D MOT method for multi-camera detectors.\\nFollowing the Tracking-By-Detection framework, RockTrack is compatible with\\nvarious off-the-shelf detectors. RockTrack incorporates a confidence-guided\\npreprocessing module to extract reliable motion and image observations from\\ndistinct representation spaces from a single detector. These observations are\\nthen fused in an association module that leverages geometric and appearance\\ncues to minimize mismatches. The resulting matches are propagated through a\\nstaged estimation process, forming the basis for heuristic noise modeling.\\nAdditionally, we introduce a novel appearance similarity metric for explicitly\\ncharacterizing object affinities in multi-camera settings. RockTrack achieves\\nstate-of-the-art performance on the nuScenes vision-only tracking leaderboard\\nwith 59.1% AMOTA while demonstrating impressive computational efficiency.\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11749\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11749","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

随着三维物体检测技术的快速发展,三维多目标跟踪(MOT)的性能得到了显著提高,特别是在成本效益高的多摄像头设置中。然而,多摄像头跟踪器普遍采用的端到端训练方法导致了特定于探测器的模型,限制了其通用性。此外,当前的通用跟踪器忽略了多摄像机探测器的独特特征,即运动观测的不稳定性和视觉信息的可行性。为了应对这些挑战,我们提出了一种适用于多摄像头探测器的 3D MOT 方法--RockTrack。RockTrack 采用置信度引导预处理模块,从单个探测器的不同表示空间中提取可靠的运动和图像观测值。然后将这些观察结果融合到一个关联模块中,该模块利用几何和外观线索来尽量减少不匹配。此外,我们还引入了一种新颖的外观相似度量,用于明确描述多摄像头环境下的物体亲和性。RockTrack 在 nuScenes 纯视觉跟踪排行榜上取得了最先进的性能,AMOTA 为 59.1%,同时显示出令人印象深刻的计算效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
RockTrack: A 3D Robust Multi-Camera-Ken Multi-Object Tracking Framework
3D Multi-Object Tracking (MOT) obtains significant performance improvements with the rapid advancements in 3D object detection, particularly in cost-effective multi-camera setups. However, the prevalent end-to-end training approach for multi-camera trackers results in detector-specific models, limiting their versatility. Moreover, current generic trackers overlook the unique features of multi-camera detectors, i.e., the unreliability of motion observations and the feasibility of visual information. To address these challenges, we propose RockTrack, a 3D MOT method for multi-camera detectors. Following the Tracking-By-Detection framework, RockTrack is compatible with various off-the-shelf detectors. RockTrack incorporates a confidence-guided preprocessing module to extract reliable motion and image observations from distinct representation spaces from a single detector. These observations are then fused in an association module that leverages geometric and appearance cues to minimize mismatches. The resulting matches are propagated through a staged estimation process, forming the basis for heuristic noise modeling. Additionally, we introduce a novel appearance similarity metric for explicitly characterizing object affinities in multi-camera settings. RockTrack achieves state-of-the-art performance on the nuScenes vision-only tracking leaderboard with 59.1% AMOTA while demonstrating impressive computational efficiency.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信