EFTrack: Enhanced fusion for visual object tracking

IF 3.1 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation Pub Date : 2025-07-30 DOI:10.1016/j.jvcir.2025.104554

Xu Guan , Chunyan Hu , Lin Xie , Shuai Yang , Feifei Lee , Qiu Chen

{"title":"EFTrack: Enhanced fusion for visual object tracking","authors":"Xu Guan , Chunyan Hu , Lin Xie , Shuai Yang , Feifei Lee , Qiu Chen","doi":"10.1016/j.jvcir.2025.104554","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, deep learning-based networks for object tracking mainly adopt the single-stream single-stage framework. However, this approach often overlooks the backbone network’s own limitations. To address the issue, this paper utilizes an independent backbone network to directly construct the tracker and proposes optimizations. First, we propose a contour information enhancement (CIE) module to distinguish objects from the background through frequency domain filtering. Secondly, a patch information fusion (PIF) module is introduced to enable information interaction between non-overlapping patches. Furthermore, a lightweight multi-scale feature fusion module is proposed to enhance the backbone network’s capability to learn multi-scale information. The network’s generalization is enhanced using the DropMAE pre-trained model. The proposed tracker demonstrates superior performance on benchmark datasets, surpassing TATrack-B and SeqTrack-B384 networks by 3.4 % and 1.9 % respectively in terms of the AO metric on the GOT-10k dataset. The code is released at https://github.com/ Nirvanalll/EFTrack.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104554"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325001683","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, deep learning-based networks for object tracking mainly adopt the single-stream single-stage framework. However, this approach often overlooks the backbone network’s own limitations. To address the issue, this paper utilizes an independent backbone network to directly construct the tracker and proposes optimizations. First, we propose a contour information enhancement (CIE) module to distinguish objects from the background through frequency domain filtering. Secondly, a patch information fusion (PIF) module is introduced to enable information interaction between non-overlapping patches. Furthermore, a lightweight multi-scale feature fusion module is proposed to enhance the backbone network’s capability to learn multi-scale information. The network’s generalization is enhanced using the DropMAE pre-trained model. The proposed tracker demonstrates superior performance on benchmark datasets, surpassing TATrack-B and SeqTrack-B384 networks by 3.4 % and 1.9 % respectively in terms of the AO metric on the GOT-10k dataset. The code is released at https://github.com/ Nirvanalll/EFTrack.

查看原文本刊更多论文

EFTrack：增强的视觉对象跟踪融合

目前，基于深度学习的目标跟踪网络主要采用单流单阶段框架。然而，这种方法往往忽略了骨干网自身的局限性。针对这一问题，本文利用独立的骨干网直接构建跟踪器并提出优化方案。首先，我们提出了轮廓信息增强（CIE）模块，通过频域滤波将目标与背景区分开来。其次，引入补丁信息融合（PIF）模块，实现不重叠补丁之间的信息交互；此外，为了增强骨干网对多尺度信息的学习能力，提出了一种轻量级的多尺度特征融合模块。使用DropMAE预训练模型增强了网络的泛化能力。所提出的跟踪器在基准数据集上表现出优异的性能，在GOT-10k数据集上的AO指标方面分别超过tattrack - b和SeqTrack-B384网络3.4%和1.9%。代码发布在https://github.com/ nirvanall /EFTrack。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Visual Communication and Image Representation 工程技术-计算机：软件工程

CiteScore

5.40

自引率

11.50%

发文量

188

审稿时长

9.9 months

期刊介绍： The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.