通过新颖的特征范式应对将外观线索纳入启发式多目标跟踪器的挑战

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2024-10-02 DOI:10.1109/TIP.2024.3468901

Chongwei Liu;Haojie Li;Zhihui Wang;Rui Xu

{"title":"通过新颖的特征范式应对将外观线索纳入启发式多目标跟踪器的挑战","authors":"Chongwei Liu;Haojie Li;Zhihui Wang;Rui Xu","doi":"10.1109/TIP.2024.3468901","DOIUrl":null,"url":null,"abstract":"In the field of Multi-Object Tracking (MOT), the incorporation of appearance cues into tracking-by-detection heuristic trackers using re-identification (ReID) features has posed limitations on its advancement. The existing ReID paradigm involves the extraction of coarse-grained object-level feature vectors from cropped objects at a fixed input size using a ReID model, and similarity computation through a simple normalized inner product. However, MOT requires fine-grained features from different object regions and more accurate similarity measurements to identify individuals, especially in the presence of occlusion. To address these limitations, we propose a novel feature paradigm. In this paradigm, we extract the feature map from the entire frame image to preserve object sizes and represent objects using a set of fine-grained features from different object regions. These features are sampled from adaptive patches within the object bounding box on the feature map to effectively capture local appearance cues. We introduce Mutual Ratio Similarity (MRS) to accurately measure the similarity of the most discriminative region between two objects based on the sampled patches, which proves effective in handling occlusion. Moreover, we propose absolute Intersection over Union (AIoU) to consider object sizes in feature cost computation. We integrate our paradigm with advanced motion techniques to develop a heuristic Motion-Feature joint multi-object tracker, MoFe. Within it, we reformulate the track state transition of tracklets to better model their life cycle, and firstly introduce a runtime recorder after MoFe to refine trajectories. Extensive experiments on five benchmarks, i.e., GMOT-40, BDD100k, DanceTrack, MOT17, and MOT20, demonstrate that MoFe achieves state-of-the-art performance in robustness and generalizability without any fine-tuning, and even surpasses the performance of fine-tuned ReID features.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5727-5739"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Addressing Challenges of Incorporating Appearance Cues Into Heuristic Multi-Object Tracker via a Novel Feature Paradigm\",\"authors\":\"Chongwei Liu;Haojie Li;Zhihui Wang;Rui Xu\",\"doi\":\"10.1109/TIP.2024.3468901\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the field of Multi-Object Tracking (MOT), the incorporation of appearance cues into tracking-by-detection heuristic trackers using re-identification (ReID) features has posed limitations on its advancement. The existing ReID paradigm involves the extraction of coarse-grained object-level feature vectors from cropped objects at a fixed input size using a ReID model, and similarity computation through a simple normalized inner product. However, MOT requires fine-grained features from different object regions and more accurate similarity measurements to identify individuals, especially in the presence of occlusion. To address these limitations, we propose a novel feature paradigm. In this paradigm, we extract the feature map from the entire frame image to preserve object sizes and represent objects using a set of fine-grained features from different object regions. These features are sampled from adaptive patches within the object bounding box on the feature map to effectively capture local appearance cues. We introduce Mutual Ratio Similarity (MRS) to accurately measure the similarity of the most discriminative region between two objects based on the sampled patches, which proves effective in handling occlusion. Moreover, we propose absolute Intersection over Union (AIoU) to consider object sizes in feature cost computation. We integrate our paradigm with advanced motion techniques to develop a heuristic Motion-Feature joint multi-object tracker, MoFe. Within it, we reformulate the track state transition of tracklets to better model their life cycle, and firstly introduce a runtime recorder after MoFe to refine trajectories. Extensive experiments on five benchmarks, i.e., GMOT-40, BDD100k, DanceTrack, MOT17, and MOT20, demonstrate that MoFe achieves state-of-the-art performance in robustness and generalizability without any fine-tuning, and even surpasses the performance of fine-tuned ReID features.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"33 \",\"pages\":\"5727-5739\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10704601/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10704601/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在多目标跟踪（MOT）领域，使用再识别（ReID）特征将外观线索纳入通过检测进行跟踪的启发式跟踪器，对其发展造成了限制。现有的 ReID 模式包括使用 ReID 模型从固定输入尺寸的裁剪对象中提取粗粒度对象级特征向量，并通过简单的归一化内积计算相似性。然而，MOT 需要从不同物体区域提取细粒度特征，并进行更精确的相似性测量，以识别个体，尤其是在存在遮挡的情况下。为了解决这些局限性，我们提出了一种新颖的特征范式。在这一范例中，我们从整个帧图像中提取特征图，以保留物体尺寸，并使用来自不同物体区域的一组精细特征来表示物体。这些特征从特征图上物体边界框内的自适应斑块中采样，以有效捕捉局部外观线索。我们引入了互比相似度（Mutual Ratio Similarity，MRS），以根据采样斑块精确测量两个物体之间最具辨别力区域的相似度，这在处理遮挡时被证明是有效的。此外，我们还提出了绝对交集大于联合（AIoU），以在计算特征成本时考虑物体的大小。我们将我们的范式与先进的运动技术相结合，开发出了启发式运动-特征联合多目标跟踪器 MoFe。其中，我们重新制定了小轨迹的轨迹状态转换，以更好地模拟它们的生命周期，并首先在 MoFe 之后引入运行时记录器来完善轨迹。在五个基准（即 GMOT-40、BDD100k、DanceTrack、MOT17 和 MOT20）上进行的广泛实验表明，MoFe 在不进行任何微调的情况下，在鲁棒性和普适性方面达到了最先进的性能，甚至超过了微调 ReID 特征的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Addressing Challenges of Incorporating Appearance Cues Into Heuristic Multi-Object Tracker via a Novel Feature Paradigm

In the field of Multi-Object Tracking (MOT), the incorporation of appearance cues into tracking-by-detection heuristic trackers using re-identification (ReID) features has posed limitations on its advancement. The existing ReID paradigm involves the extraction of coarse-grained object-level feature vectors from cropped objects at a fixed input size using a ReID model, and similarity computation through a simple normalized inner product. However, MOT requires fine-grained features from different object regions and more accurate similarity measurements to identify individuals, especially in the presence of occlusion. To address these limitations, we propose a novel feature paradigm. In this paradigm, we extract the feature map from the entire frame image to preserve object sizes and represent objects using a set of fine-grained features from different object regions. These features are sampled from adaptive patches within the object bounding box on the feature map to effectively capture local appearance cues. We introduce Mutual Ratio Similarity (MRS) to accurately measure the similarity of the most discriminative region between two objects based on the sampled patches, which proves effective in handling occlusion. Moreover, we propose absolute Intersection over Union (AIoU) to consider object sizes in feature cost computation. We integrate our paradigm with advanced motion techniques to develop a heuristic Motion-Feature joint multi-object tracker, MoFe. Within it, we reformulate the track state transition of tracklets to better model their life cycle, and firstly introduce a runtime recorder after MoFe to refine trajectories. Extensive experiments on five benchmarks, i.e., GMOT-40, BDD100k, DanceTrack, MOT17, and MOT20, demonstrate that MoFe achieves state-of-the-art performance in robustness and generalizability without any fine-tuning, and even surpasses the performance of fine-tuned ReID features.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量