{"title":"基于SRHandNet的高帧率虚拟手势交互系统","authors":"Wei Wei, Zesong Yang, Qianru Li, Sirui Tao","doi":"10.1109/yac57282.2022.10023852","DOIUrl":null,"url":null,"abstract":"Hand gesture recognition plays an important role in virtual human-computer interaction systems. However, existing methods often rely on a variety of smart sensors or data gloves, which significantly increases the cost and threshold of research in this area. Therefore, this paper sets out to hand key points recognition from a low-cost monocular camera. Previous research on hand key points recognition with monocular cameras suffer from high latency and instability since they process each frame independently via a traditional deep learning backbone that cost much computing power. To alleviate this issue, we build a real-time interaction system with a monocular camera based on SRHandNet, which is a simplified and qualified deep neural network model on hand detection. We modify the original model with TensorRT and Kalman Filtering, intending to achieve a more stable and efficient application for both windows and edge platforms(e.g. NVIDIA Jetson), on which the capturing frame rate of the interaction system is doubled. With the estimation of camera position, we realized the mapping between 2D and 3D hand key points, achieving virtual human-machine interaction. A series of experiments validated the superiority of the proposed model over baselines, suggesting that this work may have great prospect in model deployment and hand pose recognition in augmented reality applications.","PeriodicalId":272227,"journal":{"name":"2022 37th Youth Academic Annual Conference of Chinese Association of Automation (YAC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Virtual Hand-Gesture Interaction System based on SRHandNet with High Frame Rate\",\"authors\":\"Wei Wei, Zesong Yang, Qianru Li, Sirui Tao\",\"doi\":\"10.1109/yac57282.2022.10023852\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hand gesture recognition plays an important role in virtual human-computer interaction systems. However, existing methods often rely on a variety of smart sensors or data gloves, which significantly increases the cost and threshold of research in this area. Therefore, this paper sets out to hand key points recognition from a low-cost monocular camera. Previous research on hand key points recognition with monocular cameras suffer from high latency and instability since they process each frame independently via a traditional deep learning backbone that cost much computing power. To alleviate this issue, we build a real-time interaction system with a monocular camera based on SRHandNet, which is a simplified and qualified deep neural network model on hand detection. We modify the original model with TensorRT and Kalman Filtering, intending to achieve a more stable and efficient application for both windows and edge platforms(e.g. NVIDIA Jetson), on which the capturing frame rate of the interaction system is doubled. With the estimation of camera position, we realized the mapping between 2D and 3D hand key points, achieving virtual human-machine interaction. A series of experiments validated the superiority of the proposed model over baselines, suggesting that this work may have great prospect in model deployment and hand pose recognition in augmented reality applications.\",\"PeriodicalId\":272227,\"journal\":{\"name\":\"2022 37th Youth Academic Annual Conference of Chinese Association of Automation (YAC)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 37th Youth Academic Annual Conference of Chinese Association of Automation (YAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/yac57282.2022.10023852\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 37th Youth Academic Annual Conference of Chinese Association of Automation (YAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/yac57282.2022.10023852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Virtual Hand-Gesture Interaction System based on SRHandNet with High Frame Rate
Hand gesture recognition plays an important role in virtual human-computer interaction systems. However, existing methods often rely on a variety of smart sensors or data gloves, which significantly increases the cost and threshold of research in this area. Therefore, this paper sets out to hand key points recognition from a low-cost monocular camera. Previous research on hand key points recognition with monocular cameras suffer from high latency and instability since they process each frame independently via a traditional deep learning backbone that cost much computing power. To alleviate this issue, we build a real-time interaction system with a monocular camera based on SRHandNet, which is a simplified and qualified deep neural network model on hand detection. We modify the original model with TensorRT and Kalman Filtering, intending to achieve a more stable and efficient application for both windows and edge platforms(e.g. NVIDIA Jetson), on which the capturing frame rate of the interaction system is doubled. With the estimation of camera position, we realized the mapping between 2D and 3D hand key points, achieving virtual human-machine interaction. A series of experiments validated the superiority of the proposed model over baselines, suggesting that this work may have great prospect in model deployment and hand pose recognition in augmented reality applications.