{"title":"通过新颖的特征范式应对将外观线索纳入启发式多目标跟踪器的挑战","authors":"Chongwei Liu;Haojie Li;Zhihui Wang;Rui Xu","doi":"10.1109/TIP.2024.3468901","DOIUrl":null,"url":null,"abstract":"In the field of Multi-Object Tracking (MOT), the incorporation of appearance cues into tracking-by-detection heuristic trackers using re-identification (ReID) features has posed limitations on its advancement. The existing ReID paradigm involves the extraction of coarse-grained object-level feature vectors from cropped objects at a fixed input size using a ReID model, and similarity computation through a simple normalized inner product. However, MOT requires fine-grained features from different object regions and more accurate similarity measurements to identify individuals, especially in the presence of occlusion. To address these limitations, we propose a novel feature paradigm. In this paradigm, we extract the feature map from the entire frame image to preserve object sizes and represent objects using a set of fine-grained features from different object regions. These features are sampled from adaptive patches within the object bounding box on the feature map to effectively capture local appearance cues. We introduce Mutual Ratio Similarity (MRS) to accurately measure the similarity of the most discriminative region between two objects based on the sampled patches, which proves effective in handling occlusion. Moreover, we propose absolute Intersection over Union (AIoU) to consider object sizes in feature cost computation. We integrate our paradigm with advanced motion techniques to develop a heuristic Motion-Feature joint multi-object tracker, MoFe. Within it, we reformulate the track state transition of tracklets to better model their life cycle, and firstly introduce a runtime recorder after MoFe to refine trajectories. Extensive experiments on five benchmarks, i.e., GMOT-40, BDD100k, DanceTrack, MOT17, and MOT20, demonstrate that MoFe achieves state-of-the-art performance in robustness and generalizability without any fine-tuning, and even surpasses the performance of fine-tuned ReID features.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5727-5739"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Addressing Challenges of Incorporating Appearance Cues Into Heuristic Multi-Object Tracker via a Novel Feature Paradigm\",\"authors\":\"Chongwei Liu;Haojie Li;Zhihui Wang;Rui Xu\",\"doi\":\"10.1109/TIP.2024.3468901\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the field of Multi-Object Tracking (MOT), the incorporation of appearance cues into tracking-by-detection heuristic trackers using re-identification (ReID) features has posed limitations on its advancement. The existing ReID paradigm involves the extraction of coarse-grained object-level feature vectors from cropped objects at a fixed input size using a ReID model, and similarity computation through a simple normalized inner product. However, MOT requires fine-grained features from different object regions and more accurate similarity measurements to identify individuals, especially in the presence of occlusion. To address these limitations, we propose a novel feature paradigm. In this paradigm, we extract the feature map from the entire frame image to preserve object sizes and represent objects using a set of fine-grained features from different object regions. These features are sampled from adaptive patches within the object bounding box on the feature map to effectively capture local appearance cues. We introduce Mutual Ratio Similarity (MRS) to accurately measure the similarity of the most discriminative region between two objects based on the sampled patches, which proves effective in handling occlusion. Moreover, we propose absolute Intersection over Union (AIoU) to consider object sizes in feature cost computation. We integrate our paradigm with advanced motion techniques to develop a heuristic Motion-Feature joint multi-object tracker, MoFe. Within it, we reformulate the track state transition of tracklets to better model their life cycle, and firstly introduce a runtime recorder after MoFe to refine trajectories. Extensive experiments on five benchmarks, i.e., GMOT-40, BDD100k, DanceTrack, MOT17, and MOT20, demonstrate that MoFe achieves state-of-the-art performance in robustness and generalizability without any fine-tuning, and even surpasses the performance of fine-tuned ReID features.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"33 \",\"pages\":\"5727-5739\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10704601/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10704601/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Addressing Challenges of Incorporating Appearance Cues Into Heuristic Multi-Object Tracker via a Novel Feature Paradigm
In the field of Multi-Object Tracking (MOT), the incorporation of appearance cues into tracking-by-detection heuristic trackers using re-identification (ReID) features has posed limitations on its advancement. The existing ReID paradigm involves the extraction of coarse-grained object-level feature vectors from cropped objects at a fixed input size using a ReID model, and similarity computation through a simple normalized inner product. However, MOT requires fine-grained features from different object regions and more accurate similarity measurements to identify individuals, especially in the presence of occlusion. To address these limitations, we propose a novel feature paradigm. In this paradigm, we extract the feature map from the entire frame image to preserve object sizes and represent objects using a set of fine-grained features from different object regions. These features are sampled from adaptive patches within the object bounding box on the feature map to effectively capture local appearance cues. We introduce Mutual Ratio Similarity (MRS) to accurately measure the similarity of the most discriminative region between two objects based on the sampled patches, which proves effective in handling occlusion. Moreover, we propose absolute Intersection over Union (AIoU) to consider object sizes in feature cost computation. We integrate our paradigm with advanced motion techniques to develop a heuristic Motion-Feature joint multi-object tracker, MoFe. Within it, we reformulate the track state transition of tracklets to better model their life cycle, and firstly introduce a runtime recorder after MoFe to refine trajectories. Extensive experiments on five benchmarks, i.e., GMOT-40, BDD100k, DanceTrack, MOT17, and MOT20, demonstrate that MoFe achieves state-of-the-art performance in robustness and generalizability without any fine-tuning, and even surpasses the performance of fine-tuned ReID features.