{"title":"MMOT:运动感知多目标跟踪与光流","authors":"Haodong Liu, Tianyang Xu, Xiaojun Wu","doi":"10.1145/3581807.3581824","DOIUrl":null,"url":null,"abstract":"Modern multi-object tracking (MOT) benefited from recent advances in deep neural network and large video datasets. However, there are still some challenges impeding further improvement of the tracking performance, including complex background, fast motion and occlusion scenes. In this paper, we propose a new framework which employs motion information with optical flow, enable directly distinguishing the foreground and background regions. The proposed end-to-end network consists of two branches to separately model the spatial feature representations and optical flow motion patterns. We propose different fusion mechanism by combining the motion clues and appearance information. The results on MOT17 dataset show that our method is an effective mechanism in modeling temporal-spatial information.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MMOT: Motion-Aware Multi-Object Tracking with Optical Flow\",\"authors\":\"Haodong Liu, Tianyang Xu, Xiaojun Wu\",\"doi\":\"10.1145/3581807.3581824\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern multi-object tracking (MOT) benefited from recent advances in deep neural network and large video datasets. However, there are still some challenges impeding further improvement of the tracking performance, including complex background, fast motion and occlusion scenes. In this paper, we propose a new framework which employs motion information with optical flow, enable directly distinguishing the foreground and background regions. The proposed end-to-end network consists of two branches to separately model the spatial feature representations and optical flow motion patterns. We propose different fusion mechanism by combining the motion clues and appearance information. The results on MOT17 dataset show that our method is an effective mechanism in modeling temporal-spatial information.\",\"PeriodicalId\":292813,\"journal\":{\"name\":\"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3581807.3581824\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3581807.3581824","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
MMOT: Motion-Aware Multi-Object Tracking with Optical Flow
Modern multi-object tracking (MOT) benefited from recent advances in deep neural network and large video datasets. However, there are still some challenges impeding further improvement of the tracking performance, including complex background, fast motion and occlusion scenes. In this paper, we propose a new framework which employs motion information with optical flow, enable directly distinguishing the foreground and background regions. The proposed end-to-end network consists of two branches to separately model the spatial feature representations and optical flow motion patterns. We propose different fusion mechanism by combining the motion clues and appearance information. The results on MOT17 dataset show that our method is an effective mechanism in modeling temporal-spatial information.