{"title":"极其微小的Siamese网络与多层次融合视觉对象跟踪","authors":"Yi Cao, H. Ji, Wenbo Zhang, S. Shirani","doi":"10.23919/fusion43075.2019.9011338","DOIUrl":null,"url":null,"abstract":"Siamese architectures have enhanced the performance of visual object tracking a lot these years. Though their great influence, less work focuses on designing tiny networks for tracking. In this paper, we propose a novel tiny Siamese (TinySiam) architecture with extremely tiny parameters and computations. Due to the limited computation requirement, the tracker could run in an extremely fast speed and has the potential to be exploited directly in embedded devices. For efficient designs in the tiny network, we first utilize the layer-level fusion between different layers by concatenating their features in the building block, which ensures the information reusing. Second, we use channel shuffle and channel split operations to ensure the channel-level feature fusion in different convolution groups, which increases the information interaction between groups. Third, we utilize the depth-wise convolution to effectively decrease convolution parameters, which benefits fast tracking a lot. The final constructed network (24K parameters and 59M FLOPs) drastically lowers model complexity. Experimental results on GOT-10k and DTB70 benchmarks for both ordinary and aerial tracking illustrate the excellently real-time attribute (129 FPS on GOT-10k and 166 FPS on DTB70) and the robust tracking performance of our TinySiam Tracker.","PeriodicalId":348881,"journal":{"name":"2019 22th International Conference on Information Fusion (FUSION)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Extremely Tiny Siamese Networks with Multi-level Fusions for Visual Object Tracking\",\"authors\":\"Yi Cao, H. Ji, Wenbo Zhang, S. Shirani\",\"doi\":\"10.23919/fusion43075.2019.9011338\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Siamese architectures have enhanced the performance of visual object tracking a lot these years. Though their great influence, less work focuses on designing tiny networks for tracking. In this paper, we propose a novel tiny Siamese (TinySiam) architecture with extremely tiny parameters and computations. Due to the limited computation requirement, the tracker could run in an extremely fast speed and has the potential to be exploited directly in embedded devices. For efficient designs in the tiny network, we first utilize the layer-level fusion between different layers by concatenating their features in the building block, which ensures the information reusing. Second, we use channel shuffle and channel split operations to ensure the channel-level feature fusion in different convolution groups, which increases the information interaction between groups. Third, we utilize the depth-wise convolution to effectively decrease convolution parameters, which benefits fast tracking a lot. The final constructed network (24K parameters and 59M FLOPs) drastically lowers model complexity. Experimental results on GOT-10k and DTB70 benchmarks for both ordinary and aerial tracking illustrate the excellently real-time attribute (129 FPS on GOT-10k and 166 FPS on DTB70) and the robust tracking performance of our TinySiam Tracker.\",\"PeriodicalId\":348881,\"journal\":{\"name\":\"2019 22th International Conference on Information Fusion (FUSION)\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 22th International Conference on Information Fusion (FUSION)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/fusion43075.2019.9011338\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 22th International Conference on Information Fusion (FUSION)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/fusion43075.2019.9011338","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Extremely Tiny Siamese Networks with Multi-level Fusions for Visual Object Tracking
Siamese architectures have enhanced the performance of visual object tracking a lot these years. Though their great influence, less work focuses on designing tiny networks for tracking. In this paper, we propose a novel tiny Siamese (TinySiam) architecture with extremely tiny parameters and computations. Due to the limited computation requirement, the tracker could run in an extremely fast speed and has the potential to be exploited directly in embedded devices. For efficient designs in the tiny network, we first utilize the layer-level fusion between different layers by concatenating their features in the building block, which ensures the information reusing. Second, we use channel shuffle and channel split operations to ensure the channel-level feature fusion in different convolution groups, which increases the information interaction between groups. Third, we utilize the depth-wise convolution to effectively decrease convolution parameters, which benefits fast tracking a lot. The final constructed network (24K parameters and 59M FLOPs) drastically lowers model complexity. Experimental results on GOT-10k and DTB70 benchmarks for both ordinary and aerial tracking illustrate the excellently real-time attribute (129 FPS on GOT-10k and 166 FPS on DTB70) and the robust tracking performance of our TinySiam Tracker.