{"title":"Learning nested attentional feature fusion network for high performance visual tracking","authors":"Peng Gao, Xin-Yue Zhang, Tao Yu","doi":"10.1007/s10489-025-06588-x","DOIUrl":null,"url":null,"abstract":"<div><p>Siamese network-based visual tracking has made significant progress in recent years, with correlation calculations playing a central role in these models. However, the inherently linear and localized nature of correlation often leads to substantial semantic information loss and convergence to local optima, thereby limiting the potential for further performance improvements. To address these challenges, we propose a feature fusion network inspired by the Transformer architecture, incorporating nested attention mechanisms to enhance tracking accuracy and robustness. Unlike standard Transformer-based models, our approach refines correlation accuracy by emphasizing correct matches while attenuating incorrect ones through nested attentional representation learning. This enables more effective feature aggregation and information propagation. Our feature fusion network consists of four interdependent modules: ego-context augmentation, short-term feature augmentation, long-term feature augmentation, and cross-feature augmentation. These modules collaboratively fuse features from target templates and search regions, producing semantically rich feature maps superior to those generated by traditional correlation methods. Built on this framework, our proposed model, AiATransT, achieves state-of-the-art performance on five benchmark datasets, validated by extensive experimental evaluations.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 10","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06588-x","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Siamese network-based visual tracking has made significant progress in recent years, with correlation calculations playing a central role in these models. However, the inherently linear and localized nature of correlation often leads to substantial semantic information loss and convergence to local optima, thereby limiting the potential for further performance improvements. To address these challenges, we propose a feature fusion network inspired by the Transformer architecture, incorporating nested attention mechanisms to enhance tracking accuracy and robustness. Unlike standard Transformer-based models, our approach refines correlation accuracy by emphasizing correct matches while attenuating incorrect ones through nested attentional representation learning. This enables more effective feature aggregation and information propagation. Our feature fusion network consists of four interdependent modules: ego-context augmentation, short-term feature augmentation, long-term feature augmentation, and cross-feature augmentation. These modules collaboratively fuse features from target templates and search regions, producing semantically rich feature maps superior to those generated by traditional correlation methods. Built on this framework, our proposed model, AiATransT, achieves state-of-the-art performance on five benchmark datasets, validated by extensive experimental evaluations.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.