{"title":"STCMOT:基于无人机的多目标跟踪时空聚合学习","authors":"Jianbo Ma, Chuanming Tang, Fei Wu, Can Zhao, Jianlin Zhang, Zhiyong Xu","doi":"arxiv-2409.11234","DOIUrl":null,"url":null,"abstract":"Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is\nimportant for diverse applications in computer vision. Current MOT trackers\nrely on accurate object detection results and precise matching of target\nreidentification (ReID). These methods focus on optimizing target spatial\nattributes while overlooking temporal cues in modelling object relationships,\nespecially for challenging tracking conditions such as object deformation and\nblurring, etc. To address the above-mentioned issues, we propose a novel\nSpatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT), which\nutilizes historical embedding features to model the representation of ReID and\ndetection features in a sequential order. Concretely, a temporal embedding\nboosting module is introduced to enhance the discriminability of individual\nembedding based on adjacent frame cooperation. While the trajectory embedding\nis then propagated by a temporal detection refinement module to mine salient\ntarget locations in the temporal field. Extensive experiments on the\nVisDrone2019 and UAVDT datasets demonstrate our STCMOT sets a new\nstate-of-the-art performance in MOTA and IDF1 metrics. The source codes are\nreleased at https://github.com/ydhcg-BoBo/STCMOT.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking\",\"authors\":\"Jianbo Ma, Chuanming Tang, Fei Wu, Can Zhao, Jianlin Zhang, Zhiyong Xu\",\"doi\":\"arxiv-2409.11234\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is\\nimportant for diverse applications in computer vision. Current MOT trackers\\nrely on accurate object detection results and precise matching of target\\nreidentification (ReID). These methods focus on optimizing target spatial\\nattributes while overlooking temporal cues in modelling object relationships,\\nespecially for challenging tracking conditions such as object deformation and\\nblurring, etc. To address the above-mentioned issues, we propose a novel\\nSpatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT), which\\nutilizes historical embedding features to model the representation of ReID and\\ndetection features in a sequential order. Concretely, a temporal embedding\\nboosting module is introduced to enhance the discriminability of individual\\nembedding based on adjacent frame cooperation. While the trajectory embedding\\nis then propagated by a temporal detection refinement module to mine salient\\ntarget locations in the temporal field. Extensive experiments on the\\nVisDrone2019 and UAVDT datasets demonstrate our STCMOT sets a new\\nstate-of-the-art performance in MOTA and IDF1 metrics. The source codes are\\nreleased at https://github.com/ydhcg-BoBo/STCMOT.\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11234\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11234","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking
Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is
important for diverse applications in computer vision. Current MOT trackers
rely on accurate object detection results and precise matching of target
reidentification (ReID). These methods focus on optimizing target spatial
attributes while overlooking temporal cues in modelling object relationships,
especially for challenging tracking conditions such as object deformation and
blurring, etc. To address the above-mentioned issues, we propose a novel
Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT), which
utilizes historical embedding features to model the representation of ReID and
detection features in a sequential order. Concretely, a temporal embedding
boosting module is introduced to enhance the discriminability of individual
embedding based on adjacent frame cooperation. While the trajectory embedding
is then propagated by a temporal detection refinement module to mine salient
target locations in the temporal field. Extensive experiments on the
VisDrone2019 and UAVDT datasets demonstrate our STCMOT sets a new
state-of-the-art performance in MOTA and IDF1 metrics. The source codes are
released at https://github.com/ydhcg-BoBo/STCMOT.