EFTrack: Enhanced fusion for visual object tracking

IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Xu Guan , Chunyan Hu , Lin Xie , Shuai Yang , Feifei Lee , Qiu Chen
{"title":"EFTrack: Enhanced fusion for visual object tracking","authors":"Xu Guan ,&nbsp;Chunyan Hu ,&nbsp;Lin Xie ,&nbsp;Shuai Yang ,&nbsp;Feifei Lee ,&nbsp;Qiu Chen","doi":"10.1016/j.jvcir.2025.104554","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, deep learning-based networks for object tracking mainly adopt the single-stream single-stage framework. However, this approach often overlooks the backbone network’s own limitations. To address the issue, this paper utilizes an independent backbone network to directly construct the tracker and proposes optimizations. First, we propose a contour information enhancement (CIE) module to distinguish objects from the background through frequency domain filtering. Secondly, a patch information fusion (PIF) module is introduced to enable information interaction between non-overlapping patches. Furthermore, a lightweight multi-scale feature fusion module is proposed to enhance the backbone network’s capability to learn multi-scale information. The network’s generalization is enhanced using the DropMAE pre-trained model. The proposed tracker demonstrates superior performance on benchmark datasets, surpassing TATrack-B and SeqTrack-B384 networks by 3.4 % and 1.9 % respectively in terms of the AO metric on the GOT-10k dataset. The code is released at https://github.com/ Nirvanalll/EFTrack.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104554"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325001683","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Recently, deep learning-based networks for object tracking mainly adopt the single-stream single-stage framework. However, this approach often overlooks the backbone network’s own limitations. To address the issue, this paper utilizes an independent backbone network to directly construct the tracker and proposes optimizations. First, we propose a contour information enhancement (CIE) module to distinguish objects from the background through frequency domain filtering. Secondly, a patch information fusion (PIF) module is introduced to enable information interaction between non-overlapping patches. Furthermore, a lightweight multi-scale feature fusion module is proposed to enhance the backbone network’s capability to learn multi-scale information. The network’s generalization is enhanced using the DropMAE pre-trained model. The proposed tracker demonstrates superior performance on benchmark datasets, surpassing TATrack-B and SeqTrack-B384 networks by 3.4 % and 1.9 % respectively in terms of the AO metric on the GOT-10k dataset. The code is released at https://github.com/ Nirvanalll/EFTrack.
EFTrack:增强的视觉对象跟踪融合
目前,基于深度学习的目标跟踪网络主要采用单流单阶段框架。然而,这种方法往往忽略了骨干网自身的局限性。针对这一问题,本文利用独立的骨干网直接构建跟踪器并提出优化方案。首先,我们提出了轮廓信息增强(CIE)模块,通过频域滤波将目标与背景区分开来。其次,引入补丁信息融合(PIF)模块,实现不重叠补丁之间的信息交互;此外,为了增强骨干网对多尺度信息的学习能力,提出了一种轻量级的多尺度特征融合模块。使用DropMAE预训练模型增强了网络的泛化能力。所提出的跟踪器在基准数据集上表现出优异的性能,在GOT-10k数据集上的AO指标方面分别超过tattrack - b和SeqTrack-B384网络3.4%和1.9%。代码发布在https://github.com/ nirvanall /EFTrack。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Visual Communication and Image Representation
Journal of Visual Communication and Image Representation 工程技术-计算机:软件工程
CiteScore
5.40
自引率
11.50%
发文量
188
审稿时长
9.9 months
期刊介绍: The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信