{"title":"Graph-based joint detection and tracking with Euclidean edges for multi-object video analysis","authors":"Nozha Jlidi , Sameh Kouni , Olfa Jemai , Tahani Bouchrika","doi":"10.1016/j.displa.2025.103229","DOIUrl":null,"url":null,"abstract":"<div><div>Human detection and tracking are crucial tasks in computer vision, involving the identification and monitoring of individuals within specific areas, with applications in robotics, surveillance, and autonomous vehicles. These tasks face challenges due to variable environments, overlapping subjects, and computational limitations. To address these, we propose a novel approach using Graph Neural Networks (GNN) for joint detection and tracking (JDT) of humans in videos. Our method converts video into a graph, where nodes represent detected individuals, and edges represent connections between nodes across different frames. Node associations are established by measuring Euclidean distances between neighboring nodes, and the closest nodes are selected to form edges. This process is iteratively applied across all pairs of frames, resulting in a comprehensive graph structure for tracking. Our GNN-based JDT model was evaluated on the MOT16, MOT17, and MOT20 datasets, achieving MOTA of 85.2, ML of 11, IDF1 of 46, and MT of 65.7 on the MOT16 dataset, MOTA of 86.7 and IDF1 of 72.7 on the MOT17 dataset, and MOTA of 73.5 and IDF1 of 71.2 on the MOT20 dataset. The results demonstrate that our model outperforms existing state-of-the-art methods in both accuracy and efficiency. Through this innovative graph-based method, we contribute a robust and scalable solution to the field of human detection and tracking.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103229"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225002665","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Human detection and tracking are crucial tasks in computer vision, involving the identification and monitoring of individuals within specific areas, with applications in robotics, surveillance, and autonomous vehicles. These tasks face challenges due to variable environments, overlapping subjects, and computational limitations. To address these, we propose a novel approach using Graph Neural Networks (GNN) for joint detection and tracking (JDT) of humans in videos. Our method converts video into a graph, where nodes represent detected individuals, and edges represent connections between nodes across different frames. Node associations are established by measuring Euclidean distances between neighboring nodes, and the closest nodes are selected to form edges. This process is iteratively applied across all pairs of frames, resulting in a comprehensive graph structure for tracking. Our GNN-based JDT model was evaluated on the MOT16, MOT17, and MOT20 datasets, achieving MOTA of 85.2, ML of 11, IDF1 of 46, and MT of 65.7 on the MOT16 dataset, MOTA of 86.7 and IDF1 of 72.7 on the MOT17 dataset, and MOTA of 73.5 and IDF1 of 71.2 on the MOT20 dataset. The results demonstrate that our model outperforms existing state-of-the-art methods in both accuracy and efficiency. Through this innovative graph-based method, we contribute a robust and scalable solution to the field of human detection and tracking.
期刊介绍:
Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface.
Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.