{"title":"Siam2C: Siamese visual segmentation and tracking with classification-rank loss and classification-aware","authors":"Bangjun Lei, Qishuai Ding, Weisheng Li, Hao Tian, Lifang Zhou","doi":"10.1007/s10489-024-05840-0","DOIUrl":null,"url":null,"abstract":"<div><p>Siamese visual trackers based on segmentation have garnered considerable attention due to their high accuracy. However, these trackers rely solely on simple classification confidence to distinguish between positive and negative samples (foreground or background), lacking more precise discrimination capabilities for objects. Moreover, the backbone network excels at focusing on local information during feature extraction, failing to capture the long-distance contextual semantics crucial for classification. Consequently, these trackers are highly susceptible to interference during actual tracking, leading to erroneous object segmentation and subsequent tracking failures, thereby compromising robustness. For this purpose, we propose a Siamese visual segmentation and tracking network with classification-rank loss and classification-aware (Siam2C). We design a classification-rank loss (CRL) algorithm to enlarge the margin between positive and negative samples, ensuring that positive samples are ranked higher than negative ones. This optimization enhances the network’s ability to learn from positive and negative samples, allowing the tracker to accurately select the object for segmentation and tracking rather than being misled by interfering targets. Additionally, we design a classification-aware attention module (CAM), which employs spatial and channel self-attention mechanisms to capture long-distance dependencies between different positions in the feature map. The module enhances the feature representation capability of the backbone network, providing richer global contextual semantic information for the tracking network’s classification decisions. Extensive experiments on the VOT2016, VOT2018, VOT2019, OTB100, UAV123, GOT-10k, DAVIS2016, and DAVIS2017 datasets demonstrate the outstanding performance of Siam2C.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 24","pages":"12898 - 12921"},"PeriodicalIF":3.4000,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-024-05840-0","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Siamese visual trackers based on segmentation have garnered considerable attention due to their high accuracy. However, these trackers rely solely on simple classification confidence to distinguish between positive and negative samples (foreground or background), lacking more precise discrimination capabilities for objects. Moreover, the backbone network excels at focusing on local information during feature extraction, failing to capture the long-distance contextual semantics crucial for classification. Consequently, these trackers are highly susceptible to interference during actual tracking, leading to erroneous object segmentation and subsequent tracking failures, thereby compromising robustness. For this purpose, we propose a Siamese visual segmentation and tracking network with classification-rank loss and classification-aware (Siam2C). We design a classification-rank loss (CRL) algorithm to enlarge the margin between positive and negative samples, ensuring that positive samples are ranked higher than negative ones. This optimization enhances the network’s ability to learn from positive and negative samples, allowing the tracker to accurately select the object for segmentation and tracking rather than being misled by interfering targets. Additionally, we design a classification-aware attention module (CAM), which employs spatial and channel self-attention mechanisms to capture long-distance dependencies between different positions in the feature map. The module enhances the feature representation capability of the backbone network, providing richer global contextual semantic information for the tracking network’s classification decisions. Extensive experiments on the VOT2016, VOT2018, VOT2019, OTB100, UAV123, GOT-10k, DAVIS2016, and DAVIS2017 datasets demonstrate the outstanding performance of Siam2C.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.