Jiajie Wang, Lei Yu, Shuai Yuan, Jiali Long, Wen Xie, Weiqiang Chen, Bangshu Xiong, Qiaofeng Ou
{"title":"Research on human action recognition Algorithm in infrared video based on five-dimensional bidirectional dynamic convolution and multi-head attention","authors":"Jiajie Wang, Lei Yu, Shuai Yuan, Jiali Long, Wen Xie, Weiqiang Chen, Bangshu Xiong, Qiaofeng Ou","doi":"10.1016/j.infrared.2025.106196","DOIUrl":null,"url":null,"abstract":"<div><div>Human action recognition is a hot research topic in computer vision. In recent years, infrared imaging technology has shown unique advantages in intelligent security and health monitoring due to its night-time environmental adaptability and privacy protection features. However, existing methods face challenges such as insufficient spatiotemporal feature representation and difficulty in distinguishing highly similar actions in complex scenes due to the inherent characteristics of infrared video, including low resolution and lack of texture information. To address these issues, This paper proposes an improved model based on Five-dimensional Bidirectional Dynamic Convolution and Multi-head Attention mechanism. Firstly, to tackle the feature extraction challenge in infrared video, a five-dimensional bidirectional dynamic convolution module is designed. This module dynamically adjusts convolution kernel parameters through five types of attention weights—spatial, temporal, channel, filter, and kernel dimensions—to enhance sensitivity to low-contrast motion features. Meanwhile, deconvolutional residual connections are introduced to preserve significant spatiotemporal regions and mitigate detail loss. Secondly, to resolve the misclassification problem of highly similar actions, an efficient multi-head separable attention module is proposed. This module reduces computational overhead by sharing query and key parameters for spatial–temporal and channel attention, employs dimensionality reduction projection strategies to compress key-value matrix dimensions, and integrates depthwise separable modules to further optimize feature interaction efficiency. The comparative experiment results show that the proposed method achieved recognition accuracies of 79.37% and 87.56% on the IITR and InfAR datasets, respectively, demonstrating its superiority. The ablation experiment results indicate that our method significantly improves model accuracy and has research value. Code is available at: <span><span>https://github.com/ysls160915/C3D_FBDConv_EMHSA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":13549,"journal":{"name":"Infrared Physics & Technology","volume":"152 ","pages":"Article 106196"},"PeriodicalIF":3.4000,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infrared Physics & Technology","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S135044952500489X","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INSTRUMENTS & INSTRUMENTATION","Score":null,"Total":0}
引用次数: 0
Abstract
Human action recognition is a hot research topic in computer vision. In recent years, infrared imaging technology has shown unique advantages in intelligent security and health monitoring due to its night-time environmental adaptability and privacy protection features. However, existing methods face challenges such as insufficient spatiotemporal feature representation and difficulty in distinguishing highly similar actions in complex scenes due to the inherent characteristics of infrared video, including low resolution and lack of texture information. To address these issues, This paper proposes an improved model based on Five-dimensional Bidirectional Dynamic Convolution and Multi-head Attention mechanism. Firstly, to tackle the feature extraction challenge in infrared video, a five-dimensional bidirectional dynamic convolution module is designed. This module dynamically adjusts convolution kernel parameters through five types of attention weights—spatial, temporal, channel, filter, and kernel dimensions—to enhance sensitivity to low-contrast motion features. Meanwhile, deconvolutional residual connections are introduced to preserve significant spatiotemporal regions and mitigate detail loss. Secondly, to resolve the misclassification problem of highly similar actions, an efficient multi-head separable attention module is proposed. This module reduces computational overhead by sharing query and key parameters for spatial–temporal and channel attention, employs dimensionality reduction projection strategies to compress key-value matrix dimensions, and integrates depthwise separable modules to further optimize feature interaction efficiency. The comparative experiment results show that the proposed method achieved recognition accuracies of 79.37% and 87.56% on the IITR and InfAR datasets, respectively, demonstrating its superiority. The ablation experiment results indicate that our method significantly improves model accuracy and has research value. Code is available at: https://github.com/ysls160915/C3D_FBDConv_EMHSA.
期刊介绍:
The Journal covers the entire field of infrared physics and technology: theory, experiment, application, devices and instrumentation. Infrared'' is defined as covering the near, mid and far infrared (terahertz) regions from 0.75um (750nm) to 1mm (300GHz.) Submissions in the 300GHz to 100GHz region may be accepted at the editors discretion if their content is relevant to shorter wavelengths. Submissions must be primarily concerned with and directly relevant to this spectral region.
Its core topics can be summarized as the generation, propagation and detection, of infrared radiation; the associated optics, materials and devices; and its use in all fields of science, industry, engineering and medicine.
Infrared techniques occur in many different fields, notably spectroscopy and interferometry; material characterization and processing; atmospheric physics, astronomy and space research. Scientific aspects include lasers, quantum optics, quantum electronics, image processing and semiconductor physics. Some important applications are medical diagnostics and treatment, industrial inspection and environmental monitoring.