Thermal infrared action recognition with two-stream shift Graph Convolutional Network

IF 2.4 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jishi Liu, Huanyu Wang, Junnian Wang, Dalin He, Ruihan Xu, Xiongfeng Tang
{"title":"Thermal infrared action recognition with two-stream shift Graph Convolutional Network","authors":"Jishi Liu, Huanyu Wang, Junnian Wang, Dalin He, Ruihan Xu, Xiongfeng Tang","doi":"10.1007/s00138-024-01550-2","DOIUrl":null,"url":null,"abstract":"<p>The extensive deployment of camera-based IoT devices in our society is heightening the vulnerability of citizens’ sensitive information and individual data privacy. In this context, thermal imaging techniques become essential for data desensitization, entailing the elimination of sensitive data to safeguard individual privacy. Meanwhile, thermal imaging techniques can also play a important role in industry by considering the industrial environment with low resolution, high noise and unclear objects’ features. Moreover, existing works often process the entire video as a single entity, which results in suboptimal robustness by overlooking individual actions occurring at different times. In this paper, we propose a lightweight algorithm for action recognition in thermal infrared videos using human skeletons to address this. Our approach includes YOLOv7-tiny for target detection, Alphapose for pose estimation, dynamic skeleton modeling, and Graph Convolutional Networks (GCN) for spatial-temporal feature extraction in action prediction. To overcome detection and pose challenges, we created OQ35-human and OQ35-keypoint datasets for training. Besides, the proposed model enhances robustness by using visible spectrum data for GCN training. Furthermore, we introduce the two-stream shift Graph Convolutional Network to improve the action recognition accuracy. Our experimental results on the custom thermal infrared action dataset (InfAR-skeleton) demonstrate Top-1 accuracy of 88.06% and Top-5 accuracy of 98.28%. On the filtered kinetics-skeleton dataset, the algorithm achieves Top-1 accuracy of 55.26% and Top-5 accuracy of 83.98%. Thermal Infrared Action Recognition ensures the protection of individual privacy while meeting the requirements of action recognition.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"17 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Vision and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00138-024-01550-2","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The extensive deployment of camera-based IoT devices in our society is heightening the vulnerability of citizens’ sensitive information and individual data privacy. In this context, thermal imaging techniques become essential for data desensitization, entailing the elimination of sensitive data to safeguard individual privacy. Meanwhile, thermal imaging techniques can also play a important role in industry by considering the industrial environment with low resolution, high noise and unclear objects’ features. Moreover, existing works often process the entire video as a single entity, which results in suboptimal robustness by overlooking individual actions occurring at different times. In this paper, we propose a lightweight algorithm for action recognition in thermal infrared videos using human skeletons to address this. Our approach includes YOLOv7-tiny for target detection, Alphapose for pose estimation, dynamic skeleton modeling, and Graph Convolutional Networks (GCN) for spatial-temporal feature extraction in action prediction. To overcome detection and pose challenges, we created OQ35-human and OQ35-keypoint datasets for training. Besides, the proposed model enhances robustness by using visible spectrum data for GCN training. Furthermore, we introduce the two-stream shift Graph Convolutional Network to improve the action recognition accuracy. Our experimental results on the custom thermal infrared action dataset (InfAR-skeleton) demonstrate Top-1 accuracy of 88.06% and Top-5 accuracy of 98.28%. On the filtered kinetics-skeleton dataset, the algorithm achieves Top-1 accuracy of 55.26% and Top-5 accuracy of 83.98%. Thermal Infrared Action Recognition ensures the protection of individual privacy while meeting the requirements of action recognition.

Abstract Image

利用双流移位图卷积网络识别热红外动作
随着基于摄像头的物联网设备在社会中的广泛部署,公民的敏感信息和个人数据隐私变得更加脆弱。在这种情况下,红外热成像技术成为数据脱敏的必要手段,即消除敏感数据以保护个人隐私。同时,考虑到工业环境分辨率低、噪声大、物体特征不清晰等问题,热成像技术在工业领域也能发挥重要作用。此外,现有的研究通常将整个视频作为一个整体进行处理,从而忽略了不同时间发生的个别动作,导致鲁棒性不理想。针对这一问题,我们在本文中提出了一种利用人体骨骼在热红外视频中进行动作识别的轻量级算法。我们的方法包括用于目标检测的 YOLOv7-tiny、用于姿势估计的 Alphapose、动态骨架建模以及用于动作预测的时空特征提取的图卷积网络(GCN)。为了克服检测和姿势方面的挑战,我们创建了 OQ35-human 和 OQ35-keypoint 数据集进行训练。此外,通过使用可见光谱数据进行 GCN 训练,所提出的模型增强了鲁棒性。此外,我们还引入了双流移位图卷积网络,以提高动作识别的准确性。我们在定制热红外动作数据集(InfAR-skeleton)上的实验结果表明,Top-1 准确率为 88.06%,Top-5 准确率为 98.28%。在过滤动力学骨架数据集上,该算法的 Top-1 准确率为 55.26%,Top-5 准确率为 83.98%。热红外动作识别技术在满足动作识别要求的同时,确保了对个人隐私的保护。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Machine Vision and Applications
Machine Vision and Applications 工程技术-工程:电子与电气
CiteScore
6.30
自引率
3.00%
发文量
84
审稿时长
8.7 months
期刊介绍: Machine Vision and Applications publishes high-quality technical contributions in machine vision research and development. Specifically, the editors encourage submittals in all applications and engineering aspects of image-related computing. In particular, original contributions dealing with scientific, commercial, industrial, military, and biomedical applications of machine vision, are all within the scope of the journal. Particular emphasis is placed on engineering and technology aspects of image processing and computer vision. The following aspects of machine vision applications are of interest: algorithms, architectures, VLSI implementations, AI techniques and expert systems for machine vision, front-end sensing, multidimensional and multisensor machine vision, real-time techniques, image databases, virtual reality and visualization. Papers must include a significant experimental validation component.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信