EgoFormer：自动驾驶背景下的自我姿态分类

IF 4.3 2区综合性期刊 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Sensors Journal Pub Date : 2024-04-24 DOI:10.1109/JSEN.2024.3390794

Tayeba Qazi;M. Rupesh Kumar;Prerana Mukherjee;Brejesh Lall

{"title":"EgoFormer：自动驾驶背景下的自我姿态分类","authors":"Tayeba Qazi;M. Rupesh Kumar;Prerana Mukherjee;Brejesh Lall","doi":"10.1109/JSEN.2024.3390794","DOIUrl":null,"url":null,"abstract":"Decoding the intentions of passengers and other road users remains a critical challenge for autonomous vehicles (AVs) and intelligent transportation systems. Hand gestures are key in these interactions, offering a direct communication channel. Moreover, egocentric videos mimic a first-person perspective, aligning closely with human visual perception. Yet, the development of deep learning algorithms for detecting egocentric hand gestures in autonomous driving is hindered by the absence of useful datasets. Furthermore, there is a pressing need for gesture recognition methods to evolve from convolutional neural network (CNN)-based architectures to transformer models. To address these challenges, we present EgoDriving, a novel dataset of egocentric hand gestures, curated for driving-related hand gestures. Finally, we introduce EgoFormer, an efficient video transformer for egocentric hand gesture classification that is optimized for edge-computing deployments. EgoFormer incorporates a video dynamic position bias (VDPB) module to enhance long-range positional awareness and leverage absolute positions from convolutional sub-layers within its transformer blocks. Designed for low-resource settings, EgoFormer offers substantial reductions in inference latency and GPU utilization while maintaining competitive accuracy against the state-of-the-art hand gesture recognition frameworks.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"24 11","pages":"18133-18140"},"PeriodicalIF":4.3000,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EgoFormer: Ego-Gesture Classification in Context of Autonomous Driving\",\"authors\":\"Tayeba Qazi;M. Rupesh Kumar;Prerana Mukherjee;Brejesh Lall\",\"doi\":\"10.1109/JSEN.2024.3390794\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Decoding the intentions of passengers and other road users remains a critical challenge for autonomous vehicles (AVs) and intelligent transportation systems. Hand gestures are key in these interactions, offering a direct communication channel. Moreover, egocentric videos mimic a first-person perspective, aligning closely with human visual perception. Yet, the development of deep learning algorithms for detecting egocentric hand gestures in autonomous driving is hindered by the absence of useful datasets. Furthermore, there is a pressing need for gesture recognition methods to evolve from convolutional neural network (CNN)-based architectures to transformer models. To address these challenges, we present EgoDriving, a novel dataset of egocentric hand gestures, curated for driving-related hand gestures. Finally, we introduce EgoFormer, an efficient video transformer for egocentric hand gesture classification that is optimized for edge-computing deployments. EgoFormer incorporates a video dynamic position bias (VDPB) module to enhance long-range positional awareness and leverage absolute positions from convolutional sub-layers within its transformer blocks. Designed for low-resource settings, EgoFormer offers substantial reductions in inference latency and GPU utilization while maintaining competitive accuracy against the state-of-the-art hand gesture recognition frameworks.\",\"PeriodicalId\":447,\"journal\":{\"name\":\"IEEE Sensors Journal\",\"volume\":\"24 11\",\"pages\":\"18133-18140\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-04-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Sensors Journal\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10508297/\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10508297/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

解码乘客和其他道路使用者的意图仍然是自动驾驶汽车（AV）和智能交通系统面临的一项重大挑战。手势是这些互动的关键，它提供了一个直接的交流渠道。此外，以自我为中心的视频模仿了第一人称视角，与人类的视觉感知非常接近。然而，由于缺乏有用的数据集，用于检测自动驾驶中以自我为中心的手势的深度学习算法的开发受到了阻碍。此外，手势识别方法迫切需要从基于卷积神经网络（CNN）的架构发展到变换器模型。为了应对这些挑战，我们提出了 EgoDriving，这是一个新颖的以自我为中心的手势数据集，专门用于识别与驾驶相关的手势。最后，我们介绍了 EgoFormer，这是一种用于自我中心手势分类的高效视频转换器，针对边缘计算部署进行了优化。EgoFormer 采用了视频动态位置偏置（VDPB）模块，以增强远距离位置感知能力，并利用其转换器模块中卷积子层的绝对位置。EgoFormer 专为低资源环境设计，可大幅减少推理延迟和 GPU 利用率，同时保持与最先进的手势识别框架相比具有竞争力的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

EgoFormer: Ego-Gesture Classification in Context of Autonomous Driving

Decoding the intentions of passengers and other road users remains a critical challenge for autonomous vehicles (AVs) and intelligent transportation systems. Hand gestures are key in these interactions, offering a direct communication channel. Moreover, egocentric videos mimic a first-person perspective, aligning closely with human visual perception. Yet, the development of deep learning algorithms for detecting egocentric hand gestures in autonomous driving is hindered by the absence of useful datasets. Furthermore, there is a pressing need for gesture recognition methods to evolve from convolutional neural network (CNN)-based architectures to transformer models. To address these challenges, we present EgoDriving, a novel dataset of egocentric hand gestures, curated for driving-related hand gestures. Finally, we introduce EgoFormer, an efficient video transformer for egocentric hand gesture classification that is optimized for edge-computing deployments. EgoFormer incorporates a video dynamic position bias (VDPB) module to enhance long-range positional awareness and leverage absolute positions from convolutional sub-layers within its transformer blocks. Designed for low-resource settings, EgoFormer offers substantial reductions in inference latency and GPU utilization while maintaining competitive accuracy against the state-of-the-art hand gesture recognition frameworks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Sensors Journal 工程技术-工程：电子与电气

CiteScore

7.70

自引率

14.00%

发文量

2058

审稿时长

5.2 months

期刊介绍： The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following: -Sensor Phenomenology, Modelling, and Evaluation -Sensor Materials, Processing, and Fabrication -Chemical and Gas Sensors -Microfluidics and Biosensors -Optical Sensors -Physical Sensors: Temperature, Mechanical, Magnetic, and others -Acoustic and Ultrasonic Sensors -Sensor Packaging -Sensor Networks -Sensor Applications -Sensor Systems: Signals, Processing, and Interfaces -Actuators and Sensor Power Systems -Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting -Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data) -Sensors in Industrial Practice