行人头部跟踪:一个基准数据集和多源数据融合网络

IF 7.5 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS
Kailai Sun , Xinwei Wang , Shaobo Liu , Qianchuan Zhao , Gao Huang , Chang Liu
{"title":"行人头部跟踪:一个基准数据集和多源数据融合网络","authors":"Kailai Sun ,&nbsp;Xinwei Wang ,&nbsp;Shaobo Liu ,&nbsp;Qianchuan Zhao ,&nbsp;Gao Huang ,&nbsp;Chang Liu","doi":"10.1016/j.engappai.2025.111265","DOIUrl":null,"url":null,"abstract":"<div><div>Pedestrian detection and tracking in crowded video sequences have many applications, including autonomous driving, robot navigation and pedestrian flow analysis. However, detecting and tracking pedestrians in high-density crowds face many challenges, including intra-class occlusions, complex motions, and diverse poses. Although artificial intelligence (AI) models have achieved great progress in head detection, head tracking datasets and methods are extremely lacking. Existing head datasets have limited coverage of complex pedestrian flows and scenes (e.g., pedestrian interactions, occlusions, and object interference). It is of great importance to develop new head tracking datasets and methods. To address these challenges, we present a Chinese Large-scale Cross-scene Pedestrian Head Tracking dataset (Cchead) and a Multi-source Data Fusion Network (MDFN). The dataset has features that are of considerable interest, including 10 diverse scenes of 50,528 frames with about 2,366,249 heads and 2,358 tracks. Our dataset contains diverse pedestrian moving speeds, directions, and complex crowd pedestrian flows with collision avoidance behaviors. Existing state-of-the-art (SOTA) algorithms are tested and compared on the Cchead dataset. MDFN is the first end-to-end convolutional neural network (CNN)-based head detection and tracking network that jointly trains Red, Green, Blue (RGB) frames, pixel-level motion information (optical flow and frame difference maps), depth maps, and density maps in videos. Ablation experiments confirm the significance of multi-source data fusion. Compared with SOTA pedestrian detection and tracking methods, MDFN achieves superior performance across three datasets: Cchead, Restaurant and Crowd of Heads Dataset (CroHD). To promote further development, we share our source code and trained models for global researchers: <span><span>https://github.com/kailaisun/Cchead</span><svg><path></path></svg></span>. We hope our datasets to become essential resources towards developing pedestrian tracking in dense crowds.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"158 ","pages":"Article 111265"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards pedestrian head tracking: A benchmark dataset and a multi-source data fusion network\",\"authors\":\"Kailai Sun ,&nbsp;Xinwei Wang ,&nbsp;Shaobo Liu ,&nbsp;Qianchuan Zhao ,&nbsp;Gao Huang ,&nbsp;Chang Liu\",\"doi\":\"10.1016/j.engappai.2025.111265\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Pedestrian detection and tracking in crowded video sequences have many applications, including autonomous driving, robot navigation and pedestrian flow analysis. However, detecting and tracking pedestrians in high-density crowds face many challenges, including intra-class occlusions, complex motions, and diverse poses. Although artificial intelligence (AI) models have achieved great progress in head detection, head tracking datasets and methods are extremely lacking. Existing head datasets have limited coverage of complex pedestrian flows and scenes (e.g., pedestrian interactions, occlusions, and object interference). It is of great importance to develop new head tracking datasets and methods. To address these challenges, we present a Chinese Large-scale Cross-scene Pedestrian Head Tracking dataset (Cchead) and a Multi-source Data Fusion Network (MDFN). The dataset has features that are of considerable interest, including 10 diverse scenes of 50,528 frames with about 2,366,249 heads and 2,358 tracks. Our dataset contains diverse pedestrian moving speeds, directions, and complex crowd pedestrian flows with collision avoidance behaviors. Existing state-of-the-art (SOTA) algorithms are tested and compared on the Cchead dataset. MDFN is the first end-to-end convolutional neural network (CNN)-based head detection and tracking network that jointly trains Red, Green, Blue (RGB) frames, pixel-level motion information (optical flow and frame difference maps), depth maps, and density maps in videos. Ablation experiments confirm the significance of multi-source data fusion. Compared with SOTA pedestrian detection and tracking methods, MDFN achieves superior performance across three datasets: Cchead, Restaurant and Crowd of Heads Dataset (CroHD). To promote further development, we share our source code and trained models for global researchers: <span><span>https://github.com/kailaisun/Cchead</span><svg><path></path></svg></span>. We hope our datasets to become essential resources towards developing pedestrian tracking in dense crowds.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"158 \",\"pages\":\"Article 111265\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197625012667\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625012667","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

拥挤视频序列中的行人检测和跟踪有许多应用,包括自动驾驶、机器人导航和行人流分析。然而,在高密度人群中检测和跟踪行人面临许多挑战,包括类内闭塞,复杂的运动和多样的姿势。尽管人工智能(AI)模型在头部检测方面取得了很大进展,但头部跟踪数据集和方法却极为缺乏。现有头部数据集对复杂行人流和场景的覆盖有限(例如,行人相互作用、遮挡和物体干扰)。开发新的头部跟踪数据集和方法具有重要意义。为了解决这些挑战,我们提出了一个中国大规模跨场景行人头部跟踪数据集(Cchead)和一个多源数据融合网络(MDFN)。该数据集具有相当有趣的特征,包括10个不同的场景,50,528帧,约2,366,249头和2,358条轨道。我们的数据集包含不同的行人移动速度、方向和具有避碰行为的复杂人群行人流。现有的最先进(SOTA)算法在Cchead数据集上进行了测试和比较。MDFN是首个基于卷积神经网络(CNN)的端到端头部检测和跟踪网络,可联合训练视频中的红、绿、蓝(RGB)帧、像素级运动信息(光流和帧差图)、深度图和密度图。烧蚀实验证实了多源数据融合的重要性。与SOTA行人检测和跟踪方法相比,mmdfn在三个数据集(chechead、Restaurant和Crowd of Heads Dataset (CroHD))上表现优异。为了促进进一步的发展,我们向全球研究人员分享我们的源代码和经过培训的模型:https://github.com/kailaisun/Cchead。我们希望我们的数据集成为在密集人群中开发行人跟踪的重要资源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Towards pedestrian head tracking: A benchmark dataset and a multi-source data fusion network
Pedestrian detection and tracking in crowded video sequences have many applications, including autonomous driving, robot navigation and pedestrian flow analysis. However, detecting and tracking pedestrians in high-density crowds face many challenges, including intra-class occlusions, complex motions, and diverse poses. Although artificial intelligence (AI) models have achieved great progress in head detection, head tracking datasets and methods are extremely lacking. Existing head datasets have limited coverage of complex pedestrian flows and scenes (e.g., pedestrian interactions, occlusions, and object interference). It is of great importance to develop new head tracking datasets and methods. To address these challenges, we present a Chinese Large-scale Cross-scene Pedestrian Head Tracking dataset (Cchead) and a Multi-source Data Fusion Network (MDFN). The dataset has features that are of considerable interest, including 10 diverse scenes of 50,528 frames with about 2,366,249 heads and 2,358 tracks. Our dataset contains diverse pedestrian moving speeds, directions, and complex crowd pedestrian flows with collision avoidance behaviors. Existing state-of-the-art (SOTA) algorithms are tested and compared on the Cchead dataset. MDFN is the first end-to-end convolutional neural network (CNN)-based head detection and tracking network that jointly trains Red, Green, Blue (RGB) frames, pixel-level motion information (optical flow and frame difference maps), depth maps, and density maps in videos. Ablation experiments confirm the significance of multi-source data fusion. Compared with SOTA pedestrian detection and tracking methods, MDFN achieves superior performance across three datasets: Cchead, Restaurant and Crowd of Heads Dataset (CroHD). To promote further development, we share our source code and trained models for global researchers: https://github.com/kailaisun/Cchead. We hope our datasets to become essential resources towards developing pedestrian tracking in dense crowds.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Engineering Applications of Artificial Intelligence
Engineering Applications of Artificial Intelligence 工程技术-工程:电子与电气
CiteScore
9.60
自引率
10.00%
发文量
505
审稿时长
68 days
期刊介绍: Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信