{"title":"A dual-stream parallel architecture for robust visual tracking using scale-aware region proposals","authors":"Sudha SK, Aji S","doi":"10.1016/j.future.2025.108079","DOIUrl":null,"url":null,"abstract":"<div><div>Visual tracking in dynamic environments faces significant challenges such as occlusions, scale variations, and abrupt motion changes, particularly in traffic scenarios. Tracking multi-scale objects and maintaining the temporal correlations across video sequences is essential for accurate tracking. These challenges motivated us to present a novel method that captures long-term dependencies in motion cues using a scale-aware region proposal (SARPro) network that uses a Faster R-CNN pipeline to predict high-quality region proposals for effective multi-scale video object detection and tracking (VODT). The proposed method uses robust feature extraction through a dual-stream feature pyramid network (DS-FPN) that captures spatial and temporal patterns. The SARPro generates precise bounding box proposals, addressing object scale variations. An iterative approach incorporating an LSTM fine-tunes the bounding boxes. A low-confidence track filter (LCTFilter) is integrated into the DeepSORT tracking algorithm to filter out the least confident tracks. The SARPro is designed to operate within multi-threaded parallel computing with GPU acceleration (MTPC-GPU) to optimize simultaneous detection and tracking. Experiments on benchmark datasets demonstrate that SARPro significantly enhances accuracy, achieving robust detection and tracking of small objects in complex video sequences while ensuring real-time performance. SARPro attains mAP scores of 91.35 % with 57.2 FPS on UA-DETRAC and 88.57 % with 41.9 FPS on BDD100K datasets.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"175 ","pages":"Article 108079"},"PeriodicalIF":6.2000,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25003735","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Visual tracking in dynamic environments faces significant challenges such as occlusions, scale variations, and abrupt motion changes, particularly in traffic scenarios. Tracking multi-scale objects and maintaining the temporal correlations across video sequences is essential for accurate tracking. These challenges motivated us to present a novel method that captures long-term dependencies in motion cues using a scale-aware region proposal (SARPro) network that uses a Faster R-CNN pipeline to predict high-quality region proposals for effective multi-scale video object detection and tracking (VODT). The proposed method uses robust feature extraction through a dual-stream feature pyramid network (DS-FPN) that captures spatial and temporal patterns. The SARPro generates precise bounding box proposals, addressing object scale variations. An iterative approach incorporating an LSTM fine-tunes the bounding boxes. A low-confidence track filter (LCTFilter) is integrated into the DeepSORT tracking algorithm to filter out the least confident tracks. The SARPro is designed to operate within multi-threaded parallel computing with GPU acceleration (MTPC-GPU) to optimize simultaneous detection and tracking. Experiments on benchmark datasets demonstrate that SARPro significantly enhances accuracy, achieving robust detection and tracking of small objects in complex video sequences while ensuring real-time performance. SARPro attains mAP scores of 91.35 % with 57.2 FPS on UA-DETRAC and 88.57 % with 41.9 FPS on BDD100K datasets.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.