Self-supervised multi-object tracking based on metric learning

IF 5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems Pub Date : 2024-07-03 DOI:10.1007/s40747-024-01475-3

Xin Feng, Yan Liu, Hanzhi Yang, Xiaoning Jiao, Zhi Liu

{"title":"Self-supervised multi-object tracking based on metric learning","authors":"Xin Feng, Yan Liu, Hanzhi Yang, Xiaoning Jiao, Zhi Liu","doi":"10.1007/s40747-024-01475-3","DOIUrl":null,"url":null,"abstract":"<p>The current paradigm of joint detection and tracking still requires a large amount of instance-level trajectory annotation, which incurs high annotation costs. Moreover, treating embedding training as a classification problem would lead to difficulties in model fitting. In this paper, we propose a new self-supervised multi-object tracking based on the real-time joint detection and embedding (JDE) framework, which we termed as self-supervised multi-object tracking (SS-MOT). In SS-MOT, the short-term temporal correlations between objects within and across adjacent video frames are both considered as self-supervised constraints, where the distances between different objects are enlarged while the distances between same object of adjacent frames are brought closer. In addition, short trajectories are formed by matching pairs of adjacent frames using a matching algorithm, and these matched pairs are treated as positive samples. The distances between positive samples are then minimized for futher the feature representation of the same object. Therefore, our method can be trained on videos without instance-level annotations. We apply our approach to state-of-the-art JDE models, such as FairMOT, Cstrack, and SiamMOT, and achieve comparable results to these supevised methods on the widely used MOT17 and MOT20 challenges.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.0000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-024-01475-3","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The current paradigm of joint detection and tracking still requires a large amount of instance-level trajectory annotation, which incurs high annotation costs. Moreover, treating embedding training as a classification problem would lead to difficulties in model fitting. In this paper, we propose a new self-supervised multi-object tracking based on the real-time joint detection and embedding (JDE) framework, which we termed as self-supervised multi-object tracking (SS-MOT). In SS-MOT, the short-term temporal correlations between objects within and across adjacent video frames are both considered as self-supervised constraints, where the distances between different objects are enlarged while the distances between same object of adjacent frames are brought closer. In addition, short trajectories are formed by matching pairs of adjacent frames using a matching algorithm, and these matched pairs are treated as positive samples. The distances between positive samples are then minimized for futher the feature representation of the same object. Therefore, our method can be trained on videos without instance-level annotations. We apply our approach to state-of-the-art JDE models, such as FairMOT, Cstrack, and SiamMOT, and achieve comparable results to these supevised methods on the widely used MOT17 and MOT20 challenges.

Abstract Image

查看原文本刊更多论文

基于度量学习的自监督多目标跟踪

目前的联合检测和跟踪模式仍然需要大量实例级轨迹标注，标注成本很高。此外，将嵌入训练视为分类问题会导致模型拟合困难。本文提出了一种基于实时联合检测和嵌入（JDE）框架的新型自监督多目标跟踪方法，我们称之为自监督多目标跟踪（SS-MOT）。在 SS-MOT 中，相邻视频帧内和相邻视频帧间物体的短期时间相关性都被视为自监督约束条件，不同物体之间的距离被拉大，而相邻帧中相同物体之间的距离被拉近。此外，使用匹配算法匹配相邻帧对形成短轨迹，并将这些匹配的帧对视为正样本。然后最小化正样本之间的距离，以进一步表示同一物体的特征。因此，我们的方法可以在没有实例级注释的视频上进行训练。我们将我们的方法应用于最先进的 JDE 模型，如 FairMOT、Cstrack 和 SiamMOT，并在广泛使用的 MOT17 和 MOT20 挑战中取得了与这些先进方法相当的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Complex & Intelligent Systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

9.60

自引率

10.30%

发文量

297

期刊介绍： Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.

文献相关原料

公司名称	产品信息	采购帮参考价格