{"title":"多目标跟踪从外观分层聚类跟踪","authors":"Andreu Girbau, F. Marqu'es, Shin’ichi Satoh","doi":"10.48550/arXiv.2210.03355","DOIUrl":null,"url":null,"abstract":"Current approaches in Multiple Object Tracking (MOT) rely on the spatio-temporal coherence between detections combined with object appearance to match objects from consecutive frames. In this work, we explore MOT using object appearances as the main source of association between objects in a video, using spatial and temporal priors as weighting factors. We form initial tracklets by leveraging on the idea that instances of an object that are close in time should be similar in appearance, and build the final object tracks by fusing the tracklets in a hierarchical fashion. We conduct extensive experiments that show the effectiveness of our method over three different MOT benchmarks, MOT17, MOT20, and DanceTrack, being competitive in MOT17 and MOT20 and establishing state-of-the-art results in DanceTrack.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"16 1","pages":"362"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Multiple Object Tracking from appearance by hierarchically clustering tracklets\",\"authors\":\"Andreu Girbau, F. Marqu'es, Shin’ichi Satoh\",\"doi\":\"10.48550/arXiv.2210.03355\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Current approaches in Multiple Object Tracking (MOT) rely on the spatio-temporal coherence between detections combined with object appearance to match objects from consecutive frames. In this work, we explore MOT using object appearances as the main source of association between objects in a video, using spatial and temporal priors as weighting factors. We form initial tracklets by leveraging on the idea that instances of an object that are close in time should be similar in appearance, and build the final object tracks by fusing the tracklets in a hierarchical fashion. We conduct extensive experiments that show the effectiveness of our method over three different MOT benchmarks, MOT17, MOT20, and DanceTrack, being competitive in MOT17 and MOT20 and establishing state-of-the-art results in DanceTrack.\",\"PeriodicalId\":72437,\"journal\":{\"name\":\"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference\",\"volume\":\"16 1\",\"pages\":\"362\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2210.03355\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.03355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multiple Object Tracking from appearance by hierarchically clustering tracklets
Current approaches in Multiple Object Tracking (MOT) rely on the spatio-temporal coherence between detections combined with object appearance to match objects from consecutive frames. In this work, we explore MOT using object appearances as the main source of association between objects in a video, using spatial and temporal priors as weighting factors. We form initial tracklets by leveraging on the idea that instances of an object that are close in time should be similar in appearance, and build the final object tracks by fusing the tracklets in a hierarchical fashion. We conduct extensive experiments that show the effectiveness of our method over three different MOT benchmarks, MOT17, MOT20, and DanceTrack, being competitive in MOT17 and MOT20 and establishing state-of-the-art results in DanceTrack.