{"title":"Cross-transformer learning network for abnormal crowd human behavior detection from UAV captured images","authors":"Min Zhu , Dengyin Zhang","doi":"10.1016/j.ipm.2025.104374","DOIUrl":null,"url":null,"abstract":"<div><div>The detection of abnormal behavior in public environments is crucial for maintaining public safety and optimizing surveillance systems. With the growing deployment of unmanned aerial vehicles (UAVs) for aerial monitoring, accurately identifying abnormal crowd behavior from UAV-captured images has become a significant challenge due to occlusions, high-density scenes, and limited spatial resolution. Traditional approaches struggle with real-time adaptability and accuracy under these complex conditions. Hence, the research proposes a Cross-Transformer Learning Network that integrates spatio-temporal attention mechanisms and dynamic boundary adaptation to enhance anomaly detection in UAV surveillance data. The novel model enables pattern boundary cross-matching and feature distributions to accurately identify behavioral anomalies across high-density and occluded environments. The model iteratively ines the learned representations until the maximum responsive pixel region is identified, effectively minimizing variations, boundary detection, and pattern extraction. The model retains critical spatial-temporal correlations across frames and improves the detection of nuanced abnormalities. Through training input correlations, precise patterns are identified for the object/human/crowd boundaries to detect abnormalities. Experiments conducted on benchmark datasets, such as UCSD and Abnormal High-Density Crowds, show that the suggested approach significantly outperforms conventional models, including ConvLSTM and Hidden Markov Models (HMM). In particular, it achieves an accuracy gain of 12.31 % and a recall increase of 13.09 %, thereby emphasizing its implementation in challenging UAV surveillance scenarios. The proposed framework addresses a crucial gap in UAV-based surveillance by offering a scalable and highly precise method for detecting abnormal human behavior in complex environments, thereby paving the way for a more responsive and intelligent public safety monitoring system.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104374"},"PeriodicalIF":6.9000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325003152","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The detection of abnormal behavior in public environments is crucial for maintaining public safety and optimizing surveillance systems. With the growing deployment of unmanned aerial vehicles (UAVs) for aerial monitoring, accurately identifying abnormal crowd behavior from UAV-captured images has become a significant challenge due to occlusions, high-density scenes, and limited spatial resolution. Traditional approaches struggle with real-time adaptability and accuracy under these complex conditions. Hence, the research proposes a Cross-Transformer Learning Network that integrates spatio-temporal attention mechanisms and dynamic boundary adaptation to enhance anomaly detection in UAV surveillance data. The novel model enables pattern boundary cross-matching and feature distributions to accurately identify behavioral anomalies across high-density and occluded environments. The model iteratively ines the learned representations until the maximum responsive pixel region is identified, effectively minimizing variations, boundary detection, and pattern extraction. The model retains critical spatial-temporal correlations across frames and improves the detection of nuanced abnormalities. Through training input correlations, precise patterns are identified for the object/human/crowd boundaries to detect abnormalities. Experiments conducted on benchmark datasets, such as UCSD and Abnormal High-Density Crowds, show that the suggested approach significantly outperforms conventional models, including ConvLSTM and Hidden Markov Models (HMM). In particular, it achieves an accuracy gain of 12.31 % and a recall increase of 13.09 %, thereby emphasizing its implementation in challenging UAV surveillance scenarios. The proposed framework addresses a crucial gap in UAV-based surveillance by offering a scalable and highly precise method for detecting abnormal human behavior in complex environments, thereby paving the way for a more responsive and intelligent public safety monitoring system.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.