Yuzhuo Xu, Ting Li, Bing Zhu, Fasheng Wang, Fuming Sun
{"title":"具有多重关注机制的连体追踪网络","authors":"Yuzhuo Xu, Ting Li, Bing Zhu, Fasheng Wang, Fuming Sun","doi":"10.1007/s11063-024-11670-5","DOIUrl":null,"url":null,"abstract":"<p>Object trackers based on Siamese networks view tracking as a similarity-matching process. However, the correlation operation operates as a local linear matching process, limiting the tracker’s ability to capture the intricate nonlinear relationship between the template and search region branches. Moreover, most trackers don’t update the template and often use the first frame of an image as the initial template, which will easily lead to poor tracking performance of the algorithm when facing instances of deformation, scale variation, and occlusion of the tracking target. To this end, we propose a Simases tracking network with a multi-attention mechanism, including a template branch and a search branch. To adapt to changes in target appearance, we integrate dynamic templates and multi-attention mechanisms in the template branch to obtain more effective feature representation by fusing the features of initial templates and dynamic templates. To enhance the robustness of the tracking model, we utilize a multi-attention mechanism in the search branch that shares weights with the template branch to obtain multi-scale feature representation by fusing search region features at different scales. In addition, we design a lightweight and simple feature fusion mechanism, in which the Transformer encoder structure is utilized to fuse the information of the template area and search area, and the dynamic template is updated online based on confidence. Experimental results on publicly tracking datasets show that the proposed method achieves competitive results compared to several state-of-the-art trackers.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"41 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Siamese Tracking Network with Multi-attention Mechanism\",\"authors\":\"Yuzhuo Xu, Ting Li, Bing Zhu, Fasheng Wang, Fuming Sun\",\"doi\":\"10.1007/s11063-024-11670-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Object trackers based on Siamese networks view tracking as a similarity-matching process. However, the correlation operation operates as a local linear matching process, limiting the tracker’s ability to capture the intricate nonlinear relationship between the template and search region branches. Moreover, most trackers don’t update the template and often use the first frame of an image as the initial template, which will easily lead to poor tracking performance of the algorithm when facing instances of deformation, scale variation, and occlusion of the tracking target. To this end, we propose a Simases tracking network with a multi-attention mechanism, including a template branch and a search branch. To adapt to changes in target appearance, we integrate dynamic templates and multi-attention mechanisms in the template branch to obtain more effective feature representation by fusing the features of initial templates and dynamic templates. To enhance the robustness of the tracking model, we utilize a multi-attention mechanism in the search branch that shares weights with the template branch to obtain multi-scale feature representation by fusing search region features at different scales. In addition, we design a lightweight and simple feature fusion mechanism, in which the Transformer encoder structure is utilized to fuse the information of the template area and search area, and the dynamic template is updated online based on confidence. Experimental results on publicly tracking datasets show that the proposed method achieves competitive results compared to several state-of-the-art trackers.</p>\",\"PeriodicalId\":51144,\"journal\":{\"name\":\"Neural Processing Letters\",\"volume\":\"41 1\",\"pages\":\"\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Processing Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11063-024-11670-5\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Processing Letters","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11063-024-11670-5","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Siamese Tracking Network with Multi-attention Mechanism
Object trackers based on Siamese networks view tracking as a similarity-matching process. However, the correlation operation operates as a local linear matching process, limiting the tracker’s ability to capture the intricate nonlinear relationship between the template and search region branches. Moreover, most trackers don’t update the template and often use the first frame of an image as the initial template, which will easily lead to poor tracking performance of the algorithm when facing instances of deformation, scale variation, and occlusion of the tracking target. To this end, we propose a Simases tracking network with a multi-attention mechanism, including a template branch and a search branch. To adapt to changes in target appearance, we integrate dynamic templates and multi-attention mechanisms in the template branch to obtain more effective feature representation by fusing the features of initial templates and dynamic templates. To enhance the robustness of the tracking model, we utilize a multi-attention mechanism in the search branch that shares weights with the template branch to obtain multi-scale feature representation by fusing search region features at different scales. In addition, we design a lightweight and simple feature fusion mechanism, in which the Transformer encoder structure is utilized to fuse the information of the template area and search area, and the dynamic template is updated online based on confidence. Experimental results on publicly tracking datasets show that the proposed method achieves competitive results compared to several state-of-the-art trackers.
期刊介绍:
Neural Processing Letters is an international journal publishing research results and innovative ideas on all aspects of artificial neural networks. Coverage includes theoretical developments, biological models, new formal modes, learning, applications, software and hardware developments, and prospective researches.
The journal promotes fast exchange of information in the community of neural network researchers and users. The resurgence of interest in the field of artificial neural networks since the beginning of the 1980s is coupled to tremendous research activity in specialized or multidisciplinary groups. Research, however, is not possible without good communication between people and the exchange of information, especially in a field covering such different areas; fast communication is also a key aspect, and this is the reason for Neural Processing Letters