Research on human action recognition Algorithm in infrared video based on five-dimensional bidirectional dynamic convolution and multi-head attention

IF 3.4 3区 物理与天体物理 Q2 INSTRUMENTS & INSTRUMENTATION
Jiajie Wang, Lei Yu, Shuai Yuan, Jiali Long, Wen Xie, Weiqiang Chen, Bangshu Xiong, Qiaofeng Ou
{"title":"Research on human action recognition Algorithm in infrared video based on five-dimensional bidirectional dynamic convolution and multi-head attention","authors":"Jiajie Wang,&nbsp;Lei Yu,&nbsp;Shuai Yuan,&nbsp;Jiali Long,&nbsp;Wen Xie,&nbsp;Weiqiang Chen,&nbsp;Bangshu Xiong,&nbsp;Qiaofeng Ou","doi":"10.1016/j.infrared.2025.106196","DOIUrl":null,"url":null,"abstract":"<div><div>Human action recognition is a hot research topic in computer vision. In recent years, infrared imaging technology has shown unique advantages in intelligent security and health monitoring due to its night-time environmental adaptability and privacy protection features. However, existing methods face challenges such as insufficient spatiotemporal feature representation and difficulty in distinguishing highly similar actions in complex scenes due to the inherent characteristics of infrared video, including low resolution and lack of texture information. To address these issues, This paper proposes an improved model based on Five-dimensional Bidirectional Dynamic Convolution and Multi-head Attention mechanism. Firstly, to tackle the feature extraction challenge in infrared video, a five-dimensional bidirectional dynamic convolution module is designed. This module dynamically adjusts convolution kernel parameters through five types of attention weights—spatial, temporal, channel, filter, and kernel dimensions—to enhance sensitivity to low-contrast motion features. Meanwhile, deconvolutional residual connections are introduced to preserve significant spatiotemporal regions and mitigate detail loss. Secondly, to resolve the misclassification problem of highly similar actions, an efficient multi-head separable attention module is proposed. This module reduces computational overhead by sharing query and key parameters for spatial–temporal and channel attention, employs dimensionality reduction projection strategies to compress key-value matrix dimensions, and integrates depthwise separable modules to further optimize feature interaction efficiency. The comparative experiment results show that the proposed method achieved recognition accuracies of 79.37% and 87.56% on the IITR and InfAR datasets, respectively, demonstrating its superiority. The ablation experiment results indicate that our method significantly improves model accuracy and has research value. Code is available at: <span><span>https://github.com/ysls160915/C3D_FBDConv_EMHSA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":13549,"journal":{"name":"Infrared Physics & Technology","volume":"152 ","pages":"Article 106196"},"PeriodicalIF":3.4000,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infrared Physics & Technology","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S135044952500489X","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INSTRUMENTS & INSTRUMENTATION","Score":null,"Total":0}
引用次数: 0

Abstract

Human action recognition is a hot research topic in computer vision. In recent years, infrared imaging technology has shown unique advantages in intelligent security and health monitoring due to its night-time environmental adaptability and privacy protection features. However, existing methods face challenges such as insufficient spatiotemporal feature representation and difficulty in distinguishing highly similar actions in complex scenes due to the inherent characteristics of infrared video, including low resolution and lack of texture information. To address these issues, This paper proposes an improved model based on Five-dimensional Bidirectional Dynamic Convolution and Multi-head Attention mechanism. Firstly, to tackle the feature extraction challenge in infrared video, a five-dimensional bidirectional dynamic convolution module is designed. This module dynamically adjusts convolution kernel parameters through five types of attention weights—spatial, temporal, channel, filter, and kernel dimensions—to enhance sensitivity to low-contrast motion features. Meanwhile, deconvolutional residual connections are introduced to preserve significant spatiotemporal regions and mitigate detail loss. Secondly, to resolve the misclassification problem of highly similar actions, an efficient multi-head separable attention module is proposed. This module reduces computational overhead by sharing query and key parameters for spatial–temporal and channel attention, employs dimensionality reduction projection strategies to compress key-value matrix dimensions, and integrates depthwise separable modules to further optimize feature interaction efficiency. The comparative experiment results show that the proposed method achieved recognition accuracies of 79.37% and 87.56% on the IITR and InfAR datasets, respectively, demonstrating its superiority. The ablation experiment results indicate that our method significantly improves model accuracy and has research value. Code is available at: https://github.com/ysls160915/C3D_FBDConv_EMHSA.
基于五维双向动态卷积和多头注意的红外视频人体动作识别算法研究
人体动作识别是计算机视觉领域的一个研究热点。近年来,红外成像技术因其夜间环境适应性和隐私保护特性,在智能安防、健康监控领域显示出独特的优势。然而,由于红外视频的固有特性,包括低分辨率和缺乏纹理信息,现有方法面临着时空特征表示不足和复杂场景中难以区分高度相似动作的挑战。针对这些问题,本文提出了一种基于五维双向动态卷积和多头注意机制的改进模型。首先,针对红外视频中的特征提取问题,设计了一种五维双向动态卷积模块;该模块通过五种类型的关注权重(空间、时间、通道、滤波器和内核维度)动态调整卷积核参数,以增强对低对比度运动特征的灵敏度。同时,引入反卷积残差连接来保留重要的时空区域,减轻细节损失。其次,针对高度相似动作的误分类问题,提出了一种高效的多头可分注意模块。该模块通过共享查询和关键参数进行时空和通道关注来减少计算开销,采用降维投影策略压缩键值矩阵维数,并集成深度可分离模块进一步优化特征交互效率。对比实验结果表明,该方法在IITR和InfAR数据集上的识别准确率分别达到79.37%和87.56%,显示了其优越性。烧蚀实验结果表明,该方法显著提高了模型精度,具有一定的研究价值。代码可从https://github.com/ysls160915/C3D_FBDConv_EMHSA获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.70
自引率
12.10%
发文量
400
审稿时长
67 days
期刊介绍: The Journal covers the entire field of infrared physics and technology: theory, experiment, application, devices and instrumentation. Infrared'' is defined as covering the near, mid and far infrared (terahertz) regions from 0.75um (750nm) to 1mm (300GHz.) Submissions in the 300GHz to 100GHz region may be accepted at the editors discretion if their content is relevant to shorter wavelengths. Submissions must be primarily concerned with and directly relevant to this spectral region. Its core topics can be summarized as the generation, propagation and detection, of infrared radiation; the associated optics, materials and devices; and its use in all fields of science, industry, engineering and medicine. Infrared techniques occur in many different fields, notably spectroscopy and interferometry; material characterization and processing; atmospheric physics, astronomy and space research. Scientific aspects include lasers, quantum optics, quantum electronics, image processing and semiconductor physics. Some important applications are medical diagnostics and treatment, industrial inspection and environmental monitoring.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信