结构化团队运动中基于无人机的群体活动识别的动态图神经网络。

IF 2.8 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics Pub Date : 2025-09-08 eCollection Date: 2025-01-01 DOI:10.3389/fnbot.2025.1631998

Ishrat Zahra, Yanfeng Wu, Haifa F Alhasson, Shuaa S Alharbi, Hanan Aljuaid, Ahmad Jalal, Hui Liu

{"title":"结构化团队运动中基于无人机的群体活动识别的动态图神经网络。","authors":"Ishrat Zahra, Yanfeng Wu, Haifa F Alhasson, Shuaa S Alharbi, Hanan Aljuaid, Ahmad Jalal, Hui Liu","doi":"10.3389/fnbot.2025.1631998","DOIUrl":null,"url":null,"abstract":"Introduction: Understanding group actions in real-world settings is essential for the advancement of applications in surveillance, robotics, and autonomous systems. Group activity recognition, particularly in sports scenarios, presents unique challenges due to dynamic interactions, occlusions, and varying viewpoints. To address these challenges, we develop a deep learning system that recognizes multi-person behaviors by integrating appearance-based features (HOG, LBP, SIFT), skeletal data (MediaPipe, MOCON), and motion features. Our approach employs a Dynamic Graph Neural Network (DGNN) and Bi-LSTM architecture, enabling robust recognition of group activities in diverse and dynamic environments. To further validate our framework's adaptability, we include evaluations on Volleyball and SoccerTrack UAV-recorded datasets, which offer unique perspectives and challenges.Method: Our framework integrates YOLOv11 for object detection and SORT for tracking to extract multi-modal features-including HOG, LBP, SIFT, skeletal data (MediaPipe), and motion context (MOCON). These features are optimized using genetic algorithms and fused within a Dynamic Graph Neural Network (DGNN), which models players as nodes in a spatio-temporal graph, effectively capturing both spatial formations and temporal dynamics.Results: We evaluated our framework on three datasets: a volleyball dataset, SoccerTrack UAV-based soccer dataset, and NBA basketball dataset. Our system achieved 94.5% accuracy on the volleyball dataset (mAP: 94.2%, MPCA: 93.8%) with an inference time of 0.18 s per frame. On the SoccerTrack UAV dataset, accuracy was 91.8% (mAP: 91.5%, MPCA: 90.5%) with 0.20 s inference, and on the NBA basketball dataset, it was 91.1% (mAP: 90.8%, MPCA: 89.8%) with the same 0.20 s per frame. These results highlight our framework's high performance and efficient computational efficiency across various sports and perspectives.Discussion: Our approach demonstrates robust performance in recognizing multi-person actions across diverse conditions, highlighting its adaptability to both conventional and UAV-based video sources.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1631998"},"PeriodicalIF":2.8000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12452097/pdf/","citationCount":"0","resultStr":"{\"title\":\"Dynamic graph neural networks for UAV-based group activity recognition in structured team sports.\",\"authors\":\"Ishrat Zahra, Yanfeng Wu, Haifa F Alhasson, Shuaa S Alharbi, Hanan Aljuaid, Ahmad Jalal, Hui Liu\",\"doi\":\"10.3389/fnbot.2025.1631998\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction: Understanding group actions in real-world settings is essential for the advancement of applications in surveillance, robotics, and autonomous systems. Group activity recognition, particularly in sports scenarios, presents unique challenges due to dynamic interactions, occlusions, and varying viewpoints. To address these challenges, we develop a deep learning system that recognizes multi-person behaviors by integrating appearance-based features (HOG, LBP, SIFT), skeletal data (MediaPipe, MOCON), and motion features. Our approach employs a Dynamic Graph Neural Network (DGNN) and Bi-LSTM architecture, enabling robust recognition of group activities in diverse and dynamic environments. To further validate our framework's adaptability, we include evaluations on Volleyball and SoccerTrack UAV-recorded datasets, which offer unique perspectives and challenges.Method: Our framework integrates YOLOv11 for object detection and SORT for tracking to extract multi-modal features-including HOG, LBP, SIFT, skeletal data (MediaPipe), and motion context (MOCON). These features are optimized using genetic algorithms and fused within a Dynamic Graph Neural Network (DGNN), which models players as nodes in a spatio-temporal graph, effectively capturing both spatial formations and temporal dynamics.Results: We evaluated our framework on three datasets: a volleyball dataset, SoccerTrack UAV-based soccer dataset, and NBA basketball dataset. Our system achieved 94.5% accuracy on the volleyball dataset (mAP: 94.2%, MPCA: 93.8%) with an inference time of 0.18 s per frame. On the SoccerTrack UAV dataset, accuracy was 91.8% (mAP: 91.5%, MPCA: 90.5%) with 0.20 s inference, and on the NBA basketball dataset, it was 91.1% (mAP: 90.8%, MPCA: 89.8%) with the same 0.20 s per frame. These results highlight our framework's high performance and efficient computational efficiency across various sports and perspectives.Discussion: Our approach demonstrates robust performance in recognizing multi-person actions across diverse conditions, highlighting its adaptability to both conventional and UAV-based video sources.\",\"PeriodicalId\":12628,\"journal\":{\"name\":\"Frontiers in Neurorobotics\",\"volume\":\"19 \",\"pages\":\"1631998\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12452097/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Neurorobotics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.3389/fnbot.2025.1631998\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Neurorobotics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3389/fnbot.2025.1631998","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

引言：了解现实环境中的群体行为对于监控、机器人和自主系统的应用进步至关重要。群体活动识别，特别是在运动场景中，由于动态互动，闭塞和不同的观点，呈现出独特的挑战。为了应对这些挑战，我们开发了一个深度学习系统，通过集成基于外观的特征（HOG， LBP， SIFT），骨骼数据（MediaPipe， MOCON）和运动特征来识别多人行为。我们的方法采用动态图神经网络（DGNN）和Bi-LSTM架构，能够在多样化和动态环境中对群体活动进行鲁棒识别。为了进一步验证我们的框架的适应性，我们对排球和SoccerTrack无人机记录的数据集进行了评估，这些数据集提供了独特的视角和挑战。方法：我们的框架集成了用于目标检测的YOLOv11和用于跟踪的SORT，以提取多模态特征-包括HOG， LBP， SIFT，骨骼数据（MediaPipe）和运动上下文（MOCON）。这些功能使用遗传算法进行优化，并融合到动态图神经网络（DGNN）中，该网络将玩家建模为时空图中的节点，有效地捕获空间形成和时间动态。结果：我们在三个数据集上评估了我们的框架：排球数据集、基于SoccerTrack无人机的足球数据集和NBA篮球数据集。我们的系统在排球数据集上实现了94.5%的准确率（mAP: 94.2%, MPCA: 93.8%），每帧推理时间为0.18 s。在SoccerTrack无人机数据集上，准确率为91.8% (mAP: 91.5%, MPCA: 90.5%)，每帧推理0.20 s；在NBA篮球数据集上，准确率为91.1% (mAP: 90.8%, MPCA: 89.8%)，每帧推理0.20 s。这些结果突出了我们的框架在各种运动和视角中的高性能和高效计算效率。讨论：我们的方法在识别不同条件下的多人动作方面表现出强大的性能，突出了它对传统和基于无人机的视频源的适应性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dynamic graph neural networks for UAV-based group activity recognition in structured team sports.

Introduction: Understanding group actions in real-world settings is essential for the advancement of applications in surveillance, robotics, and autonomous systems. Group activity recognition, particularly in sports scenarios, presents unique challenges due to dynamic interactions, occlusions, and varying viewpoints. To address these challenges, we develop a deep learning system that recognizes multi-person behaviors by integrating appearance-based features (HOG, LBP, SIFT), skeletal data (MediaPipe, MOCON), and motion features. Our approach employs a Dynamic Graph Neural Network (DGNN) and Bi-LSTM architecture, enabling robust recognition of group activities in diverse and dynamic environments. To further validate our framework's adaptability, we include evaluations on Volleyball and SoccerTrack UAV-recorded datasets, which offer unique perspectives and challenges.

Method: Our framework integrates YOLOv11 for object detection and SORT for tracking to extract multi-modal features-including HOG, LBP, SIFT, skeletal data (MediaPipe), and motion context (MOCON). These features are optimized using genetic algorithms and fused within a Dynamic Graph Neural Network (DGNN), which models players as nodes in a spatio-temporal graph, effectively capturing both spatial formations and temporal dynamics.

Results: We evaluated our framework on three datasets: a volleyball dataset, SoccerTrack UAV-based soccer dataset, and NBA basketball dataset. Our system achieved 94.5% accuracy on the volleyball dataset (mAP: 94.2%, MPCA: 93.8%) with an inference time of 0.18 s per frame. On the SoccerTrack UAV dataset, accuracy was 91.8% (mAP: 91.5%, MPCA: 90.5%) with 0.20 s inference, and on the NBA basketball dataset, it was 91.1% (mAP: 90.8%, MPCA: 89.8%) with the same 0.20 s per frame. These results highlight our framework's high performance and efficient computational efficiency across various sports and perspectives.

Discussion: Our approach demonstrates robust performance in recognizing multi-person actions across diverse conditions, highlighting its adaptability to both conventional and UAV-based video sources.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in Neurorobotics COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCER-ROBOTICS

CiteScore

5.20

自引率

6.50%

发文量

250

审稿时长

14 weeks

期刊介绍： Frontiers in Neurorobotics publishes rigorously peer-reviewed research in the science and technology of embodied autonomous neural systems. Specialty Chief Editors Alois C. Knoll and Florian Röhrbein at the Technische Universität München are supported by an outstanding Editorial Board of international experts. This multidisciplinary open-access journal is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics and the public worldwide. Neural systems include brain-inspired algorithms (e.g. connectionist networks), computational models of biological neural networks (e.g. artificial spiking neural nets, large-scale simulations of neural microcircuits) and actual biological systems (e.g. in vivo and in vitro neural nets). The focus of the journal is the embodiment of such neural systems in artificial software and hardware devices, machines, robots or any other form of physical actuation. This also includes prosthetic devices, brain machine interfaces, wearable systems, micro-machines, furniture, home appliances, as well as systems for managing micro and macro infrastructures. Frontiers in Neurorobotics also aims to publish radically new tools and methods to study plasticity and development of autonomous self-learning systems that are capable of acquiring knowledge in an open-ended manner. Models complemented with experimental studies revealing self-organizing principles of embodied neural systems are welcome. Our journal also publishes on the micro and macro engineering and mechatronics of robotic devices driven by neural systems, as well as studies on the impact that such systems will have on our daily life.