{"title":"FocusTrack:增强对小而模糊物体的检测和跟踪","authors":"Said Baz Jahfar Khan , Chuanyue Li , Peng Zhang","doi":"10.1016/j.jvcir.2025.104549","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-object tracking (MOT) is an essential task in computer vision, but it still faces significant challenges in real-world applications, especially with small, ambiguous, and occluded objects in crowded environments. The research study introduces FocusTrack, an innovative and robust one-stage multi-object tracking system to improve object detection and trajectory association in challenging conditions. FocusTrack initiates by fine-tuning YOLOv10, a modern high-performance detector, across many datasets (MOT17, MOT20, CityPersons, ETHZ, and CrowdHuman). We use copy-paste augmentation on essential training datasets to improve the detection of small and distant objects, therefore significantly improving performance in intricate visual environments.</div><div>To ensure precise and consistent tracking, FocusTrack introduces several vital modules: Modified Soft Buffered IoU (MS-BIoU) for adaptive IoU matching dependent on object sizes and detection confidence; Adaptive Similarity Enhancement (ASE) for the improvement of similarity matrices through occlusion-aware, motion-scaled, and size-weighted adjustments; and Spatial-Temporal Confidence Enhancement (STCE) to dynamically improve detection confidence using spatial overlap, motion patterns, and crowd density. Furthermore, our Track Recovery and Association Refinement (TRAR) module recovers missing objects via adaptive re-association techniques, while SV-Link ensures motion-aware, occlusion-resistant associations, and SOTS improves trajectories using Gaussian Process Regression specific for object dimensions and occlusion intensity.</div><div>After evaluation using the challenging MOT17 and MOT20 benchmarks, FocusTrack achieves HOTA scores of 66.91 and 66.5, MOTA scores of 82.32 and 77.9, and IDF1 scores of 82.96 and 82.1, respectively—exceeding other leading online trackers such as BoostTrack++ and BoT-SORT. The results confirm FocusTrack as a very efficient, real-time MOT framework, especially successful at handling complex and crowded environments with small or partially hidden objects.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104549"},"PeriodicalIF":3.1000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FocusTrack: Enhancing object detection and tracking for small and ambiguous objects\",\"authors\":\"Said Baz Jahfar Khan , Chuanyue Li , Peng Zhang\",\"doi\":\"10.1016/j.jvcir.2025.104549\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multi-object tracking (MOT) is an essential task in computer vision, but it still faces significant challenges in real-world applications, especially with small, ambiguous, and occluded objects in crowded environments. The research study introduces FocusTrack, an innovative and robust one-stage multi-object tracking system to improve object detection and trajectory association in challenging conditions. FocusTrack initiates by fine-tuning YOLOv10, a modern high-performance detector, across many datasets (MOT17, MOT20, CityPersons, ETHZ, and CrowdHuman). We use copy-paste augmentation on essential training datasets to improve the detection of small and distant objects, therefore significantly improving performance in intricate visual environments.</div><div>To ensure precise and consistent tracking, FocusTrack introduces several vital modules: Modified Soft Buffered IoU (MS-BIoU) for adaptive IoU matching dependent on object sizes and detection confidence; Adaptive Similarity Enhancement (ASE) for the improvement of similarity matrices through occlusion-aware, motion-scaled, and size-weighted adjustments; and Spatial-Temporal Confidence Enhancement (STCE) to dynamically improve detection confidence using spatial overlap, motion patterns, and crowd density. Furthermore, our Track Recovery and Association Refinement (TRAR) module recovers missing objects via adaptive re-association techniques, while SV-Link ensures motion-aware, occlusion-resistant associations, and SOTS improves trajectories using Gaussian Process Regression specific for object dimensions and occlusion intensity.</div><div>After evaluation using the challenging MOT17 and MOT20 benchmarks, FocusTrack achieves HOTA scores of 66.91 and 66.5, MOTA scores of 82.32 and 77.9, and IDF1 scores of 82.96 and 82.1, respectively—exceeding other leading online trackers such as BoostTrack++ and BoT-SORT. The results confirm FocusTrack as a very efficient, real-time MOT framework, especially successful at handling complex and crowded environments with small or partially hidden objects.</div></div>\",\"PeriodicalId\":54755,\"journal\":{\"name\":\"Journal of Visual Communication and Image Representation\",\"volume\":\"111 \",\"pages\":\"Article 104549\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Visual Communication and Image Representation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1047320325001634\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325001634","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
FocusTrack: Enhancing object detection and tracking for small and ambiguous objects
Multi-object tracking (MOT) is an essential task in computer vision, but it still faces significant challenges in real-world applications, especially with small, ambiguous, and occluded objects in crowded environments. The research study introduces FocusTrack, an innovative and robust one-stage multi-object tracking system to improve object detection and trajectory association in challenging conditions. FocusTrack initiates by fine-tuning YOLOv10, a modern high-performance detector, across many datasets (MOT17, MOT20, CityPersons, ETHZ, and CrowdHuman). We use copy-paste augmentation on essential training datasets to improve the detection of small and distant objects, therefore significantly improving performance in intricate visual environments.
To ensure precise and consistent tracking, FocusTrack introduces several vital modules: Modified Soft Buffered IoU (MS-BIoU) for adaptive IoU matching dependent on object sizes and detection confidence; Adaptive Similarity Enhancement (ASE) for the improvement of similarity matrices through occlusion-aware, motion-scaled, and size-weighted adjustments; and Spatial-Temporal Confidence Enhancement (STCE) to dynamically improve detection confidence using spatial overlap, motion patterns, and crowd density. Furthermore, our Track Recovery and Association Refinement (TRAR) module recovers missing objects via adaptive re-association techniques, while SV-Link ensures motion-aware, occlusion-resistant associations, and SOTS improves trajectories using Gaussian Process Regression specific for object dimensions and occlusion intensity.
After evaluation using the challenging MOT17 and MOT20 benchmarks, FocusTrack achieves HOTA scores of 66.91 and 66.5, MOTA scores of 82.32 and 77.9, and IDF1 scores of 82.96 and 82.1, respectively—exceeding other leading online trackers such as BoostTrack++ and BoT-SORT. The results confirm FocusTrack as a very efficient, real-time MOT framework, especially successful at handling complex and crowded environments with small or partially hidden objects.
期刊介绍:
The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.