用于智能视频监控动作识别的真实世界图卷积网络(RW-GCNs)

2021 IEEE/ACM Symposium on Edge Computing (SEC) Pub Date : 2021-12-01 DOI:10.1145/3453142.3491293

Justin Sanchez, Christopher Neff, H. Tabkhi

{"title":"用于智能视频监控动作识别的真实世界图卷积网络(RW-GCNs)","authors":"Justin Sanchez, Christopher Neff, H. Tabkhi","doi":"10.1145/3453142.3491293","DOIUrl":null,"url":null,"abstract":"Action recognition is a key algorithmic part of emerging on-the-edge smart video surveillance and security systems. Skeleton-based action recognition is an attractive approach which, instead of using RGB pixel data, relies on human pose information to classify appropriate actions. However, existing algorithms often assume ideal conditions that are not representative of real-world limitations, such as noisy input, latency requirements, and edge resource constraints. To address the limitations of existing approaches, this paper presents Real-World Graph Convolution Networks (RW-GCNs), an architecture-level solution for meeting the domain constraints of Real World Skeleton-based Action Recognition. Inspired by the presence of feedback connections in the human visual cortex, RW-GCNs leverage attentive feedback augmentation on existing near state-of-the-art (SotA) Spatial-Temporal Graph Convolution Net-works (ST-GCNs). The ST-GCNs' design choices are derived from information theory-centric principles to address both the spatial and temporal noise typically encountered in end-to-end real-time and on-the-edge smart video systems. Our results demonstrate RW-GCNs' ability to serve these applications by achieving a new SotA accuracy on the NTU-RGB-D-120 dataset at 94.1%, and achieving 32× less latency than baseline ST-GCN applications while still achieving 90.4% accuracy on the Northwestern UCLA dataset in the presence of spatial keypoint noise. RW-GCNs further show system scalability by running on the 10× cost effective NVIDIA Jetson Nano (as opposed to NVIDIA Xavier NX), while still main-taining a respectful range of throughput (15.6 to 5.5 Actions per Second) on the resource constrained device. The code is available here: https://github.com/TeCSAR-UNCC/RW-GCN.","PeriodicalId":6779,"journal":{"name":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","volume":"33 1","pages":"121-134"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Real-World Graph Convolution Networks (RW-GCNs) for Action Recognition in Smart Video Surveillance\",\"authors\":\"Justin Sanchez, Christopher Neff, H. Tabkhi\",\"doi\":\"10.1145/3453142.3491293\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Action recognition is a key algorithmic part of emerging on-the-edge smart video surveillance and security systems. Skeleton-based action recognition is an attractive approach which, instead of using RGB pixel data, relies on human pose information to classify appropriate actions. However, existing algorithms often assume ideal conditions that are not representative of real-world limitations, such as noisy input, latency requirements, and edge resource constraints. To address the limitations of existing approaches, this paper presents Real-World Graph Convolution Networks (RW-GCNs), an architecture-level solution for meeting the domain constraints of Real World Skeleton-based Action Recognition. Inspired by the presence of feedback connections in the human visual cortex, RW-GCNs leverage attentive feedback augmentation on existing near state-of-the-art (SotA) Spatial-Temporal Graph Convolution Net-works (ST-GCNs). The ST-GCNs' design choices are derived from information theory-centric principles to address both the spatial and temporal noise typically encountered in end-to-end real-time and on-the-edge smart video systems. Our results demonstrate RW-GCNs' ability to serve these applications by achieving a new SotA accuracy on the NTU-RGB-D-120 dataset at 94.1%, and achieving 32× less latency than baseline ST-GCN applications while still achieving 90.4% accuracy on the Northwestern UCLA dataset in the presence of spatial keypoint noise. RW-GCNs further show system scalability by running on the 10× cost effective NVIDIA Jetson Nano (as opposed to NVIDIA Xavier NX), while still main-taining a respectful range of throughput (15.6 to 5.5 Actions per Second) on the resource constrained device. The code is available here: https://github.com/TeCSAR-UNCC/RW-GCN.\",\"PeriodicalId\":6779,\"journal\":{\"name\":\"2021 IEEE/ACM Symposium on Edge Computing (SEC)\",\"volume\":\"33 1\",\"pages\":\"121-134\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE/ACM Symposium on Edge Computing (SEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3453142.3491293\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3453142.3491293","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

动作识别是新兴的边缘智能视频监控和安防系统的关键算法部分。基于骨骼的动作识别是一种有吸引力的方法，它不是使用RGB像素数据，而是依赖于人体姿势信息来分类适当的动作。然而，现有算法通常假设理想条件，而这些条件并不代表现实世界的限制，例如噪声输入、延迟要求和边缘资源约束。为了解决现有方法的局限性，本文提出了真实世界图卷积网络(RW-GCNs)，这是一种架构级解决方案，用于满足基于真实世界骨架的动作识别的领域约束。受人类视觉皮层反馈连接的启发，RW-GCNs在现有的近先进(SotA)时空图卷积网络(ST-GCNs)上利用细心的反馈增强。ST-GCNs的设计选择源于以信息理论为中心的原则，以解决端到端实时和边缘智能视频系统中通常遇到的空间和时间噪声。我们的研究结果表明，rw - gcn在NTU-RGB-D-120数据集上的SotA精度达到了94.1%，延迟比基线ST-GCN应用低32倍，同时在存在空间关键点噪声的西北加州大学洛杉矶分校数据集上仍然达到了90.4%的精度，从而证明了rw - gcn服务于这些应用的能力。RW-GCNs通过运行在10倍成本效益的NVIDIA Jetson Nano(与NVIDIA Xavier NX相反)上进一步显示系统可扩展性，同时在资源受限的设备上仍然保持吞吐量的尊重范围(每秒15.6到5.5个动作)。代码可从这里获得:https://github.com/TeCSAR-UNCC/RW-GCN。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Real-World Graph Convolution Networks (RW-GCNs) for Action Recognition in Smart Video Surveillance

Action recognition is a key algorithmic part of emerging on-the-edge smart video surveillance and security systems. Skeleton-based action recognition is an attractive approach which, instead of using RGB pixel data, relies on human pose information to classify appropriate actions. However, existing algorithms often assume ideal conditions that are not representative of real-world limitations, such as noisy input, latency requirements, and edge resource constraints. To address the limitations of existing approaches, this paper presents Real-World Graph Convolution Networks (RW-GCNs), an architecture-level solution for meeting the domain constraints of Real World Skeleton-based Action Recognition. Inspired by the presence of feedback connections in the human visual cortex, RW-GCNs leverage attentive feedback augmentation on existing near state-of-the-art (SotA) Spatial-Temporal Graph Convolution Net-works (ST-GCNs). The ST-GCNs' design choices are derived from information theory-centric principles to address both the spatial and temporal noise typically encountered in end-to-end real-time and on-the-edge smart video systems. Our results demonstrate RW-GCNs' ability to serve these applications by achieving a new SotA accuracy on the NTU-RGB-D-120 dataset at 94.1%, and achieving 32× less latency than baseline ST-GCN applications while still achieving 90.4% accuracy on the Northwestern UCLA dataset in the presence of spatial keypoint noise. RW-GCNs further show system scalability by running on the 10× cost effective NVIDIA Jetson Nano (as opposed to NVIDIA Xavier NX), while still main-taining a respectful range of throughput (15.6 to 5.5 Actions per Second) on the resource constrained device. The code is available here: https://github.com/TeCSAR-UNCC/RW-GCN.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE/ACM Symposium on Edge Computing (SEC)

自引率

0.00%

发文量