{"title":"Augment One With Others: Generalizing to Unforeseen Variations for Visual Tracking","authors":"Jinpu Zhang;Ziwen Li;Ruonan Wei;Yuehuan Wang","doi":"10.1109/TMM.2024.3521842","DOIUrl":null,"url":null,"abstract":"Unforeseen appearance variation is a challenging factor for visual tracking. This paper provides a novel solution from semantic data augmentation, which facilitates offline training of trackers for better generalization. We utilize existing samples to obtain knowledge to augment another in terms of diversity and hardness. First, we propose that the similarity matching space in Siamese-like models has class-agnostic transferability. Based on this, we design the Latent Augmentation (LaAug) to transfer relevant variations and suppress irrelevant ones between training similarity embeddings of different classes. Thus the model can generalize across a more diverse semantic distribution. Then, we propose the Semantic Interaction Mix (SIMix), which interacts moments between different feature samples to contaminate structure and texture attributes and retain other semantic attributes. SIMix simulates the occlusion and complements the training distribution with hard cases. The mixed features with adversarial perturbations can empirically enable the model against external environmental disturbances. Experiments on six challenging benchmarks demonstrate that three representative tracking models, i.e., SiamBAN, TransT and OSTrack, can be consistently improved by incorporating the proposed methods without extra parameters and inference cost.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"1461-1474"},"PeriodicalIF":8.4000,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10814671/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Unforeseen appearance variation is a challenging factor for visual tracking. This paper provides a novel solution from semantic data augmentation, which facilitates offline training of trackers for better generalization. We utilize existing samples to obtain knowledge to augment another in terms of diversity and hardness. First, we propose that the similarity matching space in Siamese-like models has class-agnostic transferability. Based on this, we design the Latent Augmentation (LaAug) to transfer relevant variations and suppress irrelevant ones between training similarity embeddings of different classes. Thus the model can generalize across a more diverse semantic distribution. Then, we propose the Semantic Interaction Mix (SIMix), which interacts moments between different feature samples to contaminate structure and texture attributes and retain other semantic attributes. SIMix simulates the occlusion and complements the training distribution with hard cases. The mixed features with adversarial perturbations can empirically enable the model against external environmental disturbances. Experiments on six challenging benchmarks demonstrate that three representative tracking models, i.e., SiamBAN, TransT and OSTrack, can be consistently improved by incorporating the proposed methods without extra parameters and inference cost.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.