{"title":"SwinTransTrack:多对象跟踪使用移位窗口变压器","authors":"Tianci Zhao, Changwen Zheng, Qingmeng Zhu, Hao He","doi":"10.1109/ISCIT55906.2022.9931284","DOIUrl":null,"url":null,"abstract":"With the great popularity of Transformers, there has been many works using Transformers to explore the temporal association properties of objects between different video frames. However, due to the large-scale variation of visual entities and the high resolution of pixels in images, the original Transformers take so long time for both training and inference. Based on Swin Transformer, we propose SwinTransTrack, a novel shift-window encoder and decoder model. Different from the original model, we fuse low-rank adaptation to achieve feature dimension enhancement and propose a new shifted-window decoder network to obtain accurate displacement to associate trajectories. Finally, We conducted extensive quantitative experiments on different MOT datasets, MOT17 and MOT20. The experimental results show that SwinTransTrack achieves 75.5 MOTA on MOT17 and 67.5 MOTA on MOT20, leading both MOT competitions.","PeriodicalId":325919,"journal":{"name":"2022 21st International Symposium on Communications and Information Technologies (ISCIT)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SwinTransTrack: Multi-object Tracking Using Shifted Window Transformers\",\"authors\":\"Tianci Zhao, Changwen Zheng, Qingmeng Zhu, Hao He\",\"doi\":\"10.1109/ISCIT55906.2022.9931284\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the great popularity of Transformers, there has been many works using Transformers to explore the temporal association properties of objects between different video frames. However, due to the large-scale variation of visual entities and the high resolution of pixels in images, the original Transformers take so long time for both training and inference. Based on Swin Transformer, we propose SwinTransTrack, a novel shift-window encoder and decoder model. Different from the original model, we fuse low-rank adaptation to achieve feature dimension enhancement and propose a new shifted-window decoder network to obtain accurate displacement to associate trajectories. Finally, We conducted extensive quantitative experiments on different MOT datasets, MOT17 and MOT20. The experimental results show that SwinTransTrack achieves 75.5 MOTA on MOT17 and 67.5 MOTA on MOT20, leading both MOT competitions.\",\"PeriodicalId\":325919,\"journal\":{\"name\":\"2022 21st International Symposium on Communications and Information Technologies (ISCIT)\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 21st International Symposium on Communications and Information Technologies (ISCIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCIT55906.2022.9931284\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st International Symposium on Communications and Information Technologies (ISCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCIT55906.2022.9931284","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SwinTransTrack: Multi-object Tracking Using Shifted Window Transformers
With the great popularity of Transformers, there has been many works using Transformers to explore the temporal association properties of objects between different video frames. However, due to the large-scale variation of visual entities and the high resolution of pixels in images, the original Transformers take so long time for both training and inference. Based on Swin Transformer, we propose SwinTransTrack, a novel shift-window encoder and decoder model. Different from the original model, we fuse low-rank adaptation to achieve feature dimension enhancement and propose a new shifted-window decoder network to obtain accurate displacement to associate trajectories. Finally, We conducted extensive quantitative experiments on different MOT datasets, MOT17 and MOT20. The experimental results show that SwinTransTrack achieves 75.5 MOTA on MOT17 and 67.5 MOTA on MOT20, leading both MOT competitions.