{"title":"EgoFormer:自动驾驶背景下的自我姿态分类","authors":"Tayeba Qazi;M. Rupesh Kumar;Prerana Mukherjee;Brejesh Lall","doi":"10.1109/JSEN.2024.3390794","DOIUrl":null,"url":null,"abstract":"Decoding the intentions of passengers and other road users remains a critical challenge for autonomous vehicles (AVs) and intelligent transportation systems. Hand gestures are key in these interactions, offering a direct communication channel. Moreover, egocentric videos mimic a first-person perspective, aligning closely with human visual perception. Yet, the development of deep learning algorithms for detecting egocentric hand gestures in autonomous driving is hindered by the absence of useful datasets. Furthermore, there is a pressing need for gesture recognition methods to evolve from convolutional neural network (CNN)-based architectures to transformer models. To address these challenges, we present EgoDriving, a novel dataset of egocentric hand gestures, curated for driving-related hand gestures. Finally, we introduce EgoFormer, an efficient video transformer for egocentric hand gesture classification that is optimized for edge-computing deployments. EgoFormer incorporates a video dynamic position bias (VDPB) module to enhance long-range positional awareness and leverage absolute positions from convolutional sub-layers within its transformer blocks. Designed for low-resource settings, EgoFormer offers substantial reductions in inference latency and GPU utilization while maintaining competitive accuracy against the state-of-the-art hand gesture recognition frameworks.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"24 11","pages":"18133-18140"},"PeriodicalIF":4.3000,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EgoFormer: Ego-Gesture Classification in Context of Autonomous Driving\",\"authors\":\"Tayeba Qazi;M. Rupesh Kumar;Prerana Mukherjee;Brejesh Lall\",\"doi\":\"10.1109/JSEN.2024.3390794\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Decoding the intentions of passengers and other road users remains a critical challenge for autonomous vehicles (AVs) and intelligent transportation systems. Hand gestures are key in these interactions, offering a direct communication channel. Moreover, egocentric videos mimic a first-person perspective, aligning closely with human visual perception. Yet, the development of deep learning algorithms for detecting egocentric hand gestures in autonomous driving is hindered by the absence of useful datasets. Furthermore, there is a pressing need for gesture recognition methods to evolve from convolutional neural network (CNN)-based architectures to transformer models. To address these challenges, we present EgoDriving, a novel dataset of egocentric hand gestures, curated for driving-related hand gestures. Finally, we introduce EgoFormer, an efficient video transformer for egocentric hand gesture classification that is optimized for edge-computing deployments. EgoFormer incorporates a video dynamic position bias (VDPB) module to enhance long-range positional awareness and leverage absolute positions from convolutional sub-layers within its transformer blocks. Designed for low-resource settings, EgoFormer offers substantial reductions in inference latency and GPU utilization while maintaining competitive accuracy against the state-of-the-art hand gesture recognition frameworks.\",\"PeriodicalId\":447,\"journal\":{\"name\":\"IEEE Sensors Journal\",\"volume\":\"24 11\",\"pages\":\"18133-18140\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-04-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Sensors Journal\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10508297/\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10508297/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
EgoFormer: Ego-Gesture Classification in Context of Autonomous Driving
Decoding the intentions of passengers and other road users remains a critical challenge for autonomous vehicles (AVs) and intelligent transportation systems. Hand gestures are key in these interactions, offering a direct communication channel. Moreover, egocentric videos mimic a first-person perspective, aligning closely with human visual perception. Yet, the development of deep learning algorithms for detecting egocentric hand gestures in autonomous driving is hindered by the absence of useful datasets. Furthermore, there is a pressing need for gesture recognition methods to evolve from convolutional neural network (CNN)-based architectures to transformer models. To address these challenges, we present EgoDriving, a novel dataset of egocentric hand gestures, curated for driving-related hand gestures. Finally, we introduce EgoFormer, an efficient video transformer for egocentric hand gesture classification that is optimized for edge-computing deployments. EgoFormer incorporates a video dynamic position bias (VDPB) module to enhance long-range positional awareness and leverage absolute positions from convolutional sub-layers within its transformer blocks. Designed for low-resource settings, EgoFormer offers substantial reductions in inference latency and GPU utilization while maintaining competitive accuracy against the state-of-the-art hand gesture recognition frameworks.
期刊介绍:
The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following:
-Sensor Phenomenology, Modelling, and Evaluation
-Sensor Materials, Processing, and Fabrication
-Chemical and Gas Sensors
-Microfluidics and Biosensors
-Optical Sensors
-Physical Sensors: Temperature, Mechanical, Magnetic, and others
-Acoustic and Ultrasonic Sensors
-Sensor Packaging
-Sensor Networks
-Sensor Applications
-Sensor Systems: Signals, Processing, and Interfaces
-Actuators and Sensor Power Systems
-Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting
-Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data)
-Sensors in Industrial Practice