A framework of specialized knowledge distillation for Siamese tracker on challenging attributes

IF 2.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications Pub Date : 2024-07-09 DOI:10.1007/s00138-024-01578-4

Yiding Li, Atsushi Shimada, Tsubasa Minematsu, Cheng Tang

{"title":"A framework of specialized knowledge distillation for Siamese tracker on challenging attributes","authors":"Yiding Li, Atsushi Shimada, Tsubasa Minematsu, Cheng Tang","doi":"10.1007/s00138-024-01578-4","DOIUrl":null,"url":null,"abstract":"<p>In recent years, Siamese network-based trackers have achieved significant improvements in real-time tracking. Despite their success, performance bottlenecks caused by unavoidably complex scenarios in target-tracking tasks are becoming increasingly non-negligible. For example, occlusion and fast motion are factors that can easily cause tracking failures and are labeled in many high-quality tracking databases as challenging attributes. In addition, Siamese trackers tend to suffer from high memory costs, which restricts their applicability to mobile devices with tight memory budgets. To address these issues, we propose a Specialized teachers Distilled Siamese Tracker (SDST) framework to learn a student tracker, which is small, fast, and has enhanced performance in challenging attributes. SDST introduces two types of teachers for multi-teacher distillation: general teacher and specialized teachers. The former imparts basic knowledge to the students. The latter is used to transfer specialized knowledge to students, which helps improve their performance in challenging attributes. For students to efficiently capture critical knowledge from the two types of teachers, SDST is equipped with a carefully designed multi-teacher knowledge distillation model. Our model contains two processes: general teacher-student knowledge transfer and specialized teachers-student knowledge transfer. Extensive empirical evaluations of several popular Siamese trackers demonstrated the generality and effectiveness of our framework. Moreover, the results on Large-scale Single Object Tracking (LaSOT) show that the proposed method achieves a significant improvement of more than 2–4% in most challenging attributes. SDST also maintained high overall performance while achieving compression rates of up to 8x and framerates of 252 FPS and obtaining outstanding accuracy on all challenging attributes.\n</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"16 1","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Vision and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00138-024-01578-4","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, Siamese network-based trackers have achieved significant improvements in real-time tracking. Despite their success, performance bottlenecks caused by unavoidably complex scenarios in target-tracking tasks are becoming increasingly non-negligible. For example, occlusion and fast motion are factors that can easily cause tracking failures and are labeled in many high-quality tracking databases as challenging attributes. In addition, Siamese trackers tend to suffer from high memory costs, which restricts their applicability to mobile devices with tight memory budgets. To address these issues, we propose a Specialized teachers Distilled Siamese Tracker (SDST) framework to learn a student tracker, which is small, fast, and has enhanced performance in challenging attributes. SDST introduces two types of teachers for multi-teacher distillation: general teacher and specialized teachers. The former imparts basic knowledge to the students. The latter is used to transfer specialized knowledge to students, which helps improve their performance in challenging attributes. For students to efficiently capture critical knowledge from the two types of teachers, SDST is equipped with a carefully designed multi-teacher knowledge distillation model. Our model contains two processes: general teacher-student knowledge transfer and specialized teachers-student knowledge transfer. Extensive empirical evaluations of several popular Siamese trackers demonstrated the generality and effectiveness of our framework. Moreover, the results on Large-scale Single Object Tracking (LaSOT) show that the proposed method achieves a significant improvement of more than 2–4% in most challenging attributes. SDST also maintained high overall performance while achieving compression rates of up to 8x and framerates of 252 FPS and obtaining outstanding accuracy on all challenging attributes.

Abstract Image

查看原文本刊更多论文

针对具有挑战性属性的连体跟踪器的专业知识提炼框架

近年来，基于连体网络的跟踪器在实时跟踪方面取得了显著进步。尽管取得了成功，但目标跟踪任务中不可避免的复杂场景所造成的性能瓶颈也越来越不容忽视。例如，遮挡和快速运动是容易导致跟踪失败的因素，在许多高质量跟踪数据库中被标记为具有挑战性的属性。此外，连体跟踪器的内存成本往往很高，这限制了它们在内存预算紧张的移动设备上的适用性。为了解决这些问题，我们提出了一种专用教师分馏连体跟踪器（SDST）框架来学习学生跟踪器，这种跟踪器体积小、速度快，而且在具有挑战性的属性方面性能更强。SDST 引入了两类教师进行多教师提炼：普通教师和专业教师。前者向学生传授基础知识。后者用于向学生传授专业知识，有助于提高他们在具有挑战性的属性方面的表现。为了让学生从这两类教师那里有效地获取关键知识，SDST 配备了一个精心设计的多教师知识提炼模型。我们的模型包含两个过程：普通教师-学生知识转移和专业教师-学生知识转移。对几种流行的连体跟踪器进行的广泛实证评估证明了我们框架的通用性和有效性。此外，大规模单个物体跟踪（LaSOT）的结果表明，所提出的方法在大多数具有挑战性的属性上都取得了超过 2-4% 的显著改进。SDST 还保持了较高的整体性能，同时实现了高达 8 倍的压缩率和 252 FPS 的帧率，并在所有具有挑战性的属性上获得了出色的精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Machine Vision and Applications 工程技术-工程：电子与电气

CiteScore

6.30

自引率

3.00%

发文量

审稿时长

8.7 months

期刊介绍： Machine Vision and Applications publishes high-quality technical contributions in machine vision research and development. Specifically, the editors encourage submittals in all applications and engineering aspects of image-related computing. In particular, original contributions dealing with scientific, commercial, industrial, military, and biomedical applications of machine vision, are all within the scope of the journal. Particular emphasis is placed on engineering and technology aspects of image processing and computer vision. The following aspects of machine vision applications are of interest: algorithms, architectures, VLSI implementations, AI techniques and expert systems for machine vision, front-end sensing, multidimensional and multisensor machine vision, real-time techniques, image databases, virtual reality and visualization. Papers must include a significant experimental validation component.