A deep learning framework for multi-object tracking in team sports videos

IF 1.5 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision Pub Date : 2024-01-02 DOI:10.1049/cvi2.12266

Wei Cao, Xiaoyong Wang, Xianxiang Liu, Yishuai Xu

{"title":"A deep learning framework for multi-object tracking in team sports videos","authors":"Wei Cao, Xiaoyong Wang, Xianxiang Liu, Yishuai Xu","doi":"10.1049/cvi2.12266","DOIUrl":null,"url":null,"abstract":"<p>In response to the challenges of Multi-Object Tracking (MOT) in sports scenes, such as severe occlusions, similar appearances, drastic pose changes, and complex motion patterns, a deep-learning framework CTGMOT (CNN-Transformer-GNN-based MOT) specifically for multiple athlete tracking in sports videos that performs joint modelling of detection, appearance and motion features is proposed. Firstly, a detection network that combines Convolutional Neural Networks (CNN) and Transformers is constructed to extract both local and global features from images. The fusion of appearance and motion features is achieved through a design of parallel dual-branch decoders. Secondly, graph models are built using Graph Neural Networks (GNN) to accurately capture the spatio-temporal correlations between object and trajectory features from inter-frame and intra-frame associations. Experimental results on the public sports tracking dataset SportsMOT show that the proposed framework outperforms other state-of-the-art methods for MOT in complex sport scenes. In addition, the proposed framework shows excellent generality on benchmark datasets MOT17 and MOT20.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 5","pages":"574-590"},"PeriodicalIF":1.5000,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12266","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cvi2.12266","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In response to the challenges of Multi-Object Tracking (MOT) in sports scenes, such as severe occlusions, similar appearances, drastic pose changes, and complex motion patterns, a deep-learning framework CTGMOT (CNN-Transformer-GNN-based MOT) specifically for multiple athlete tracking in sports videos that performs joint modelling of detection, appearance and motion features is proposed. Firstly, a detection network that combines Convolutional Neural Networks (CNN) and Transformers is constructed to extract both local and global features from images. The fusion of appearance and motion features is achieved through a design of parallel dual-branch decoders. Secondly, graph models are built using Graph Neural Networks (GNN) to accurately capture the spatio-temporal correlations between object and trajectory features from inter-frame and intra-frame associations. Experimental results on the public sports tracking dataset SportsMOT show that the proposed framework outperforms other state-of-the-art methods for MOT in complex sport scenes. In addition, the proposed framework shows excellent generality on benchmark datasets MOT17 and MOT20.

Abstract Image

查看原文本刊更多论文

团队运动视频中的多目标跟踪深度学习框架

针对体育场景中多目标跟踪（MOT）所面临的挑战，如严重遮挡、相似外观、剧烈姿势变化和复杂运动模式，我们提出了一种深度学习框架 CTGMOT（基于 CNN-变换器-GNN 的 MOT），专门用于体育视频中的多运动员跟踪，该框架对检测、外观和运动特征进行联合建模。首先，构建了一个结合了卷积神经网络（CNN）和变换器的检测网络，以从图像中提取局部和全局特征。通过设计并行双分支解码器，实现了外观和运动特征的融合。其次，利用图神经网络（GNN）建立图模型，从帧间和帧内关联中准确捕捉物体和轨迹特征之间的时空相关性。在公共体育追踪数据集 SportsMOT 上的实验结果表明，在复杂体育场景中的 MOT 方面，所提出的框架优于其他最先进的方法。此外，在基准数据集 MOT17 和 MOT20 上，所提出的框架也显示出卓越的通用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IET Computer Vision 工程技术-工程：电子与电气

CiteScore

3.30

自引率

11.80%

发文量

审稿时长

3.4 months

期刊介绍： IET Computer Vision seeks original research papers in a wide range of areas of computer vision. The vision of the journal is to publish the highest quality research work that is relevant and topical to the field, but not forgetting those works that aim to introduce new horizons and set the agenda for future avenues of research in computer vision. IET Computer Vision welcomes submissions on the following topics: Biologically and perceptually motivated approaches to low level vision (feature detection, etc.); Perceptual grouping and organisation Representation, analysis and matching of 2D and 3D shape Shape-from-X Object recognition Image understanding Learning with visual inputs Motion analysis and object tracking Multiview scene analysis Cognitive approaches in low, mid and high level vision Control in visual systems Colour, reflectance and light Statistical and probabilistic models Face and gesture Surveillance Biometrics and security Robotics Vehicle guidance Automatic model aquisition Medical image analysis and understanding Aerial scene analysis and remote sensing Deep learning models in computer vision Both methodological and applications orientated papers are welcome. Manuscripts submitted are expected to include a detailed and analytical review of the literature and state-of-the-art exposition of the original proposed research and its methodology, its thorough experimental evaluation, and last but not least, comparative evaluation against relevant and state-of-the-art methods. Submissions not abiding by these minimum requirements may be returned to authors without being sent to review. Special Issues Current Call for Papers: Computer Vision for Smart Cameras and Camera Networks - https://digital-library.theiet.org/files/IET_CVI_SC.pdf Computer Vision for the Creative Industries - https://digital-library.theiet.org/files/IET_CVI_CVCI.pdf