Weiyu Zhao , Yizhuo Jiang , Yan Gao , Jie Li , Xinbo Gao
{"title":"DETrack: Depth information is predictable for tracking","authors":"Weiyu Zhao , Yizhuo Jiang , Yan Gao , Jie Li , Xinbo Gao","doi":"10.1016/j.neucom.2024.128906","DOIUrl":null,"url":null,"abstract":"<div><div>The purpose of multi-object tracking lies in the estimation of both the bounding boxes of targets and their identities. Nonetheless, occlusion brought by the object interactions often cause identity switches and trajectory loss. Inspired by the human vision of three-dimensional tracking properties, we propose a tracking framework based on depth estimation called DETrack to address this issue. This framework features a Depth Information Module (DIM) under monocular conditions, which can produce depth features as an association cue for multi-object tracking. In addition, to actively retrieves information lost in trajectories, we have also put forward a ”refind” component, which echoes how human vision compensates for objects out of sight. Our framework can seamlessly integrate with most trackers, and introduce introducing an entirely new data dimension to the tracking task. We have tested DETrack using the MOT17 and DanceTrack benchmark datasets and compared it with alternative methods. The test results demonstrate that our technique works effectively with current MOT trackers, and it significantly enhances tracking results based on HOTA, IDF1, and MOTA metrics on both datasets.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"616 ","pages":"Article 128906"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224016771","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The purpose of multi-object tracking lies in the estimation of both the bounding boxes of targets and their identities. Nonetheless, occlusion brought by the object interactions often cause identity switches and trajectory loss. Inspired by the human vision of three-dimensional tracking properties, we propose a tracking framework based on depth estimation called DETrack to address this issue. This framework features a Depth Information Module (DIM) under monocular conditions, which can produce depth features as an association cue for multi-object tracking. In addition, to actively retrieves information lost in trajectories, we have also put forward a ”refind” component, which echoes how human vision compensates for objects out of sight. Our framework can seamlessly integrate with most trackers, and introduce introducing an entirely new data dimension to the tracking task. We have tested DETrack using the MOT17 and DanceTrack benchmark datasets and compared it with alternative methods. The test results demonstrate that our technique works effectively with current MOT trackers, and it significantly enhances tracking results based on HOTA, IDF1, and MOTA metrics on both datasets.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.