FocusTrack：增强对小而模糊物体的检测和跟踪

IF 3.1 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation Pub Date : 2025-08-07 DOI:10.1016/j.jvcir.2025.104549

Said Baz Jahfar Khan , Chuanyue Li , Peng Zhang

{"title":"FocusTrack：增强对小而模糊物体的检测和跟踪","authors":"Said Baz Jahfar Khan , Chuanyue Li , Peng Zhang","doi":"10.1016/j.jvcir.2025.104549","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-object tracking (MOT) is an essential task in computer vision, but it still faces significant challenges in real-world applications, especially with small, ambiguous, and occluded objects in crowded environments. The research study introduces FocusTrack, an innovative and robust one-stage multi-object tracking system to improve object detection and trajectory association in challenging conditions. FocusTrack initiates by fine-tuning YOLOv10, a modern high-performance detector, across many datasets (MOT17, MOT20, CityPersons, ETHZ, and CrowdHuman). We use copy-paste augmentation on essential training datasets to improve the detection of small and distant objects, therefore significantly improving performance in intricate visual environments.</div><div>To ensure precise and consistent tracking, FocusTrack introduces several vital modules: Modified Soft Buffered IoU (MS-BIoU) for adaptive IoU matching dependent on object sizes and detection confidence; Adaptive Similarity Enhancement (ASE) for the improvement of similarity matrices through occlusion-aware, motion-scaled, and size-weighted adjustments; and Spatial-Temporal Confidence Enhancement (STCE) to dynamically improve detection confidence using spatial overlap, motion patterns, and crowd density. Furthermore, our Track Recovery and Association Refinement (TRAR) module recovers missing objects via adaptive re-association techniques, while SV-Link ensures motion-aware, occlusion-resistant associations, and SOTS improves trajectories using Gaussian Process Regression specific for object dimensions and occlusion intensity.</div><div>After evaluation using the challenging MOT17 and MOT20 benchmarks, FocusTrack achieves HOTA scores of 66.91 and 66.5, MOTA scores of 82.32 and 77.9, and IDF1 scores of 82.96 and 82.1, respectively—exceeding other leading online trackers such as BoostTrack++ and BoT-SORT. The results confirm FocusTrack as a very efficient, real-time MOT framework, especially successful at handling complex and crowded environments with small or partially hidden objects.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104549"},"PeriodicalIF":3.1000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FocusTrack: Enhancing object detection and tracking for small and ambiguous objects\",\"authors\":\"Said Baz Jahfar Khan , Chuanyue Li , Peng Zhang\",\"doi\":\"10.1016/j.jvcir.2025.104549\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multi-object tracking (MOT) is an essential task in computer vision, but it still faces significant challenges in real-world applications, especially with small, ambiguous, and occluded objects in crowded environments. The research study introduces FocusTrack, an innovative and robust one-stage multi-object tracking system to improve object detection and trajectory association in challenging conditions. FocusTrack initiates by fine-tuning YOLOv10, a modern high-performance detector, across many datasets (MOT17, MOT20, CityPersons, ETHZ, and CrowdHuman). We use copy-paste augmentation on essential training datasets to improve the detection of small and distant objects, therefore significantly improving performance in intricate visual environments.</div><div>To ensure precise and consistent tracking, FocusTrack introduces several vital modules: Modified Soft Buffered IoU (MS-BIoU) for adaptive IoU matching dependent on object sizes and detection confidence; Adaptive Similarity Enhancement (ASE) for the improvement of similarity matrices through occlusion-aware, motion-scaled, and size-weighted adjustments; and Spatial-Temporal Confidence Enhancement (STCE) to dynamically improve detection confidence using spatial overlap, motion patterns, and crowd density. Furthermore, our Track Recovery and Association Refinement (TRAR) module recovers missing objects via adaptive re-association techniques, while SV-Link ensures motion-aware, occlusion-resistant associations, and SOTS improves trajectories using Gaussian Process Regression specific for object dimensions and occlusion intensity.</div><div>After evaluation using the challenging MOT17 and MOT20 benchmarks, FocusTrack achieves HOTA scores of 66.91 and 66.5, MOTA scores of 82.32 and 77.9, and IDF1 scores of 82.96 and 82.1, respectively—exceeding other leading online trackers such as BoostTrack++ and BoT-SORT. The results confirm FocusTrack as a very efficient, real-time MOT framework, especially successful at handling complex and crowded environments with small or partially hidden objects.</div></div>\",\"PeriodicalId\":54755,\"journal\":{\"name\":\"Journal of Visual Communication and Image Representation\",\"volume\":\"111 \",\"pages\":\"Article 104549\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Visual Communication and Image Representation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1047320325001634\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325001634","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

多目标跟踪（MOT）是计算机视觉中的一项重要任务，但它在现实世界的应用中仍然面临着重大挑战，特别是在拥挤环境中处理小的、模糊的和遮挡的物体。该研究介绍了FocusTrack，这是一种创新的、鲁棒的单级多目标跟踪系统，用于改善挑战性条件下的目标检测和轨迹关联。FocusTrack启动微调YOLOv10，现代高性能检测器，跨许多数据集（MOT17, MOT20, CityPersons， ETHZ和CrowdHuman）。我们在基本训练数据集上使用复制-粘贴增强来提高对小物体和远处物体的检测，从而显着提高在复杂视觉环境中的性能。为了确保精确和一致的跟踪，FocusTrack引入了几个重要模块：修改软缓冲IoU (MS-BIoU)，用于根据对象大小和检测置信度进行自适应IoU匹配；自适应相似性增强（ASE）通过遮挡感知、运动缩放和大小加权调整来改进相似性矩阵；时空置信度增强（STCE），利用空间重叠、运动模式和人群密度动态提高检测置信度。此外，我们的轨迹恢复和关联细化（TRAR）模块通过自适应重新关联技术恢复丢失的物体，而SV-Link确保运动感知，抗遮挡关联，SOTS使用特定于物体尺寸和遮挡强度的高斯过程回归改进轨迹。在使用具有挑战性的mo17和mo20基准进行评估后，FocusTrack的HOTA得分为66.91和66.5，MOTA得分为82.32和77.9，IDF1得分为82.96和82.1，分别超过了boosttrack++和BoT-SORT等其他领先的在线跟踪器。结果证实，FocusTrack是一种非常高效的实时MOT框架，尤其在处理具有小物体或部分隐藏物体的复杂拥挤环境时非常成功。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

FocusTrack: Enhancing object detection and tracking for small and ambiguous objects

查看原文本刊更多论文

FocusTrack: Enhancing object detection and tracking for small and ambiguous objects

Multi-object tracking (MOT) is an essential task in computer vision, but it still faces significant challenges in real-world applications, especially with small, ambiguous, and occluded objects in crowded environments. The research study introduces FocusTrack, an innovative and robust one-stage multi-object tracking system to improve object detection and trajectory association in challenging conditions. FocusTrack initiates by fine-tuning YOLOv10, a modern high-performance detector, across many datasets (MOT17, MOT20, CityPersons, ETHZ, and CrowdHuman). We use copy-paste augmentation on essential training datasets to improve the detection of small and distant objects, therefore significantly improving performance in intricate visual environments.

To ensure precise and consistent tracking, FocusTrack introduces several vital modules: Modified Soft Buffered IoU (MS-BIoU) for adaptive IoU matching dependent on object sizes and detection confidence; Adaptive Similarity Enhancement (ASE) for the improvement of similarity matrices through occlusion-aware, motion-scaled, and size-weighted adjustments; and Spatial-Temporal Confidence Enhancement (STCE) to dynamically improve detection confidence using spatial overlap, motion patterns, and crowd density. Furthermore, our Track Recovery and Association Refinement (TRAR) module recovers missing objects via adaptive re-association techniques, while SV-Link ensures motion-aware, occlusion-resistant associations, and SOTS improves trajectories using Gaussian Process Regression specific for object dimensions and occlusion intensity.

After evaluation using the challenging MOT17 and MOT20 benchmarks, FocusTrack achieves HOTA scores of 66.91 and 66.5, MOTA scores of 82.32 and 77.9, and IDF1 scores of 82.96 and 82.1, respectively—exceeding other leading online trackers such as BoostTrack++ and BoT-SORT. The results confirm FocusTrack as a very efficient, real-time MOT framework, especially successful at handling complex and crowded environments with small or partially hidden objects.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Visual Communication and Image Representation 工程技术-计算机：软件工程

CiteScore

5.40

自引率

11.50%

发文量

188

审稿时长

9.9 months

期刊介绍： The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.