Kailai Sun , Xinwei Wang , Shaobo Liu , Qianchuan Zhao , Gao Huang , Chang Liu
{"title":"Towards pedestrian head tracking: A benchmark dataset and a multi-source data fusion network","authors":"Kailai Sun , Xinwei Wang , Shaobo Liu , Qianchuan Zhao , Gao Huang , Chang Liu","doi":"10.1016/j.engappai.2025.111265","DOIUrl":null,"url":null,"abstract":"<div><div>Pedestrian detection and tracking in crowded video sequences have many applications, including autonomous driving, robot navigation and pedestrian flow analysis. However, detecting and tracking pedestrians in high-density crowds face many challenges, including intra-class occlusions, complex motions, and diverse poses. Although artificial intelligence (AI) models have achieved great progress in head detection, head tracking datasets and methods are extremely lacking. Existing head datasets have limited coverage of complex pedestrian flows and scenes (e.g., pedestrian interactions, occlusions, and object interference). It is of great importance to develop new head tracking datasets and methods. To address these challenges, we present a Chinese Large-scale Cross-scene Pedestrian Head Tracking dataset (Cchead) and a Multi-source Data Fusion Network (MDFN). The dataset has features that are of considerable interest, including 10 diverse scenes of 50,528 frames with about 2,366,249 heads and 2,358 tracks. Our dataset contains diverse pedestrian moving speeds, directions, and complex crowd pedestrian flows with collision avoidance behaviors. Existing state-of-the-art (SOTA) algorithms are tested and compared on the Cchead dataset. MDFN is the first end-to-end convolutional neural network (CNN)-based head detection and tracking network that jointly trains Red, Green, Blue (RGB) frames, pixel-level motion information (optical flow and frame difference maps), depth maps, and density maps in videos. Ablation experiments confirm the significance of multi-source data fusion. Compared with SOTA pedestrian detection and tracking methods, MDFN achieves superior performance across three datasets: Cchead, Restaurant and Crowd of Heads Dataset (CroHD). To promote further development, we share our source code and trained models for global researchers: <span><span>https://github.com/kailaisun/Cchead</span><svg><path></path></svg></span>. We hope our datasets to become essential resources towards developing pedestrian tracking in dense crowds.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"158 ","pages":"Article 111265"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625012667","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Pedestrian detection and tracking in crowded video sequences have many applications, including autonomous driving, robot navigation and pedestrian flow analysis. However, detecting and tracking pedestrians in high-density crowds face many challenges, including intra-class occlusions, complex motions, and diverse poses. Although artificial intelligence (AI) models have achieved great progress in head detection, head tracking datasets and methods are extremely lacking. Existing head datasets have limited coverage of complex pedestrian flows and scenes (e.g., pedestrian interactions, occlusions, and object interference). It is of great importance to develop new head tracking datasets and methods. To address these challenges, we present a Chinese Large-scale Cross-scene Pedestrian Head Tracking dataset (Cchead) and a Multi-source Data Fusion Network (MDFN). The dataset has features that are of considerable interest, including 10 diverse scenes of 50,528 frames with about 2,366,249 heads and 2,358 tracks. Our dataset contains diverse pedestrian moving speeds, directions, and complex crowd pedestrian flows with collision avoidance behaviors. Existing state-of-the-art (SOTA) algorithms are tested and compared on the Cchead dataset. MDFN is the first end-to-end convolutional neural network (CNN)-based head detection and tracking network that jointly trains Red, Green, Blue (RGB) frames, pixel-level motion information (optical flow and frame difference maps), depth maps, and density maps in videos. Ablation experiments confirm the significance of multi-source data fusion. Compared with SOTA pedestrian detection and tracking methods, MDFN achieves superior performance across three datasets: Cchead, Restaurant and Crowd of Heads Dataset (CroHD). To promote further development, we share our source code and trained models for global researchers: https://github.com/kailaisun/Cchead. We hope our datasets to become essential resources towards developing pedestrian tracking in dense crowds.
拥挤视频序列中的行人检测和跟踪有许多应用,包括自动驾驶、机器人导航和行人流分析。然而,在高密度人群中检测和跟踪行人面临许多挑战,包括类内闭塞,复杂的运动和多样的姿势。尽管人工智能(AI)模型在头部检测方面取得了很大进展,但头部跟踪数据集和方法却极为缺乏。现有头部数据集对复杂行人流和场景的覆盖有限(例如,行人相互作用、遮挡和物体干扰)。开发新的头部跟踪数据集和方法具有重要意义。为了解决这些挑战,我们提出了一个中国大规模跨场景行人头部跟踪数据集(Cchead)和一个多源数据融合网络(MDFN)。该数据集具有相当有趣的特征,包括10个不同的场景,50,528帧,约2,366,249头和2,358条轨道。我们的数据集包含不同的行人移动速度、方向和具有避碰行为的复杂人群行人流。现有的最先进(SOTA)算法在Cchead数据集上进行了测试和比较。MDFN是首个基于卷积神经网络(CNN)的端到端头部检测和跟踪网络,可联合训练视频中的红、绿、蓝(RGB)帧、像素级运动信息(光流和帧差图)、深度图和密度图。烧蚀实验证实了多源数据融合的重要性。与SOTA行人检测和跟踪方法相比,mmdfn在三个数据集(chechead、Restaurant和Crowd of Heads Dataset (CroHD))上表现优异。为了促进进一步的发展,我们向全球研究人员分享我们的源代码和经过培训的模型:https://github.com/kailaisun/Cchead。我们希望我们的数据集成为在密集人群中开发行人跟踪的重要资源。
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.