{"title":"Local Fine-Grained Visual Tracking","authors":"Jingjing Wu;Yifan Sun;Richang Hong","doi":"10.1109/TMM.2025.3535329","DOIUrl":null,"url":null,"abstract":"This paper introduces a novel local fine-grained visual tracking task, aiming to precisely locate arbitrary local parts of objects. This task is motivated by our observation that in many realistic scenarios, the user demands to track a local part instead of a holistic object. However, the absence of an evaluation dataset and the distinctive characteristics of local fine-grained targets present extra challenges in conducting this research. To tackle these issues, first, this paper constructs a local fine-grained tracking (LFT) dataset to evaluate the tracking performance for local fine-grained targets. Second, this paper designs a cutting-edge solution to handle the challenges posed by properties of local objects, including ambiguity and high-proportion backgrounds. It consists of a hierarchical adaptive mask mechanism and foreground-background differentiated learning. The former adaptively searches for and masks ambiguity, which drives the network to concentrate on the local target instead of the holistic objects. The latter is constructed to distinguish foreground and background in an unsupervised manner, which is beneficial to mitigate the impacts of high-proportion backgrounds. Extensive analytic experiments are performed to verify the effectiveness of each submodule in the proposed fine-grained tracker.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"3426-3436"},"PeriodicalIF":8.4000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10855545/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
This paper introduces a novel local fine-grained visual tracking task, aiming to precisely locate arbitrary local parts of objects. This task is motivated by our observation that in many realistic scenarios, the user demands to track a local part instead of a holistic object. However, the absence of an evaluation dataset and the distinctive characteristics of local fine-grained targets present extra challenges in conducting this research. To tackle these issues, first, this paper constructs a local fine-grained tracking (LFT) dataset to evaluate the tracking performance for local fine-grained targets. Second, this paper designs a cutting-edge solution to handle the challenges posed by properties of local objects, including ambiguity and high-proportion backgrounds. It consists of a hierarchical adaptive mask mechanism and foreground-background differentiated learning. The former adaptively searches for and masks ambiguity, which drives the network to concentrate on the local target instead of the holistic objects. The latter is constructed to distinguish foreground and background in an unsupervised manner, which is beneficial to mitigate the impacts of high-proportion backgrounds. Extensive analytic experiments are performed to verify the effectiveness of each submodule in the proposed fine-grained tracker.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.