上半身视频中手部检测与类型分类的混合方法

2018 7th European Workshop on Visual Information Processing (EUVIP) Pub Date : 2018-11-01 DOI:10.1109/EUVIP.2018.8611755

Katerina Papadimitriou, G. Potamianos

{"title":"上半身视频中手部检测与类型分类的混合方法","authors":"Katerina Papadimitriou, G. Potamianos","doi":"10.1109/EUVIP.2018.8611755","DOIUrl":null,"url":null,"abstract":"Detection of hands in videos and their classification into left and right types are crucial in various human-computer interaction and data mining systems. A variety of effective deep learning methods have been proposed for this task, such as region-based convolutional neural networks (R-CNNs), however the large number of their proposal windows per frame deem them computationally intensive. For this purpose we propose a hybrid approach that is based on substituting the “selective search” R-CNN module by an image processing pipeline assuming visibility of the facial region, as for example in signing and cued speech videos. Our system comprises two main phases: preprocessing and classification. In the preprocessing stage we incorporate facial information, obtained by an AdaBoost face detector, into a skin-tone based segmentation scheme that drives Kalman filtering based hand tracking, generating very few candidate windows. During classification, the extracted proposal regions are fed to a CNN for hand detection and type classification. Evaluation of the proposed hybrid approach on four well-known datasets of gestures and signing demonstrates its superior accuracy and computational efficiency over the R-CNN and its variants.","PeriodicalId":252212,"journal":{"name":"2018 7th European Workshop on Visual Information Processing (EUVIP)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Hybrid Approach to Hand Detection and Type Classification in Upper-Body Videos\",\"authors\":\"Katerina Papadimitriou, G. Potamianos\",\"doi\":\"10.1109/EUVIP.2018.8611755\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Detection of hands in videos and their classification into left and right types are crucial in various human-computer interaction and data mining systems. A variety of effective deep learning methods have been proposed for this task, such as region-based convolutional neural networks (R-CNNs), however the large number of their proposal windows per frame deem them computationally intensive. For this purpose we propose a hybrid approach that is based on substituting the “selective search” R-CNN module by an image processing pipeline assuming visibility of the facial region, as for example in signing and cued speech videos. Our system comprises two main phases: preprocessing and classification. In the preprocessing stage we incorporate facial information, obtained by an AdaBoost face detector, into a skin-tone based segmentation scheme that drives Kalman filtering based hand tracking, generating very few candidate windows. During classification, the extracted proposal regions are fed to a CNN for hand detection and type classification. Evaluation of the proposed hybrid approach on four well-known datasets of gestures and signing demonstrates its superior accuracy and computational efficiency over the R-CNN and its variants.\",\"PeriodicalId\":252212,\"journal\":{\"name\":\"2018 7th European Workshop on Visual Information Processing (EUVIP)\",\"volume\":\"84 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 7th European Workshop on Visual Information Processing (EUVIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EUVIP.2018.8611755\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 7th European Workshop on Visual Information Processing (EUVIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EUVIP.2018.8611755","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

在各种人机交互和数据挖掘系统中，视频中的手的检测及其左、右类型的分类是至关重要的。针对这一任务已经提出了各种有效的深度学习方法，例如基于区域的卷积神经网络(r - cnn)，但是它们每帧的大量提议窗口使得它们的计算密集型。为此，我们提出了一种混合方法，该方法基于将“选择性搜索”R-CNN模块替换为假设面部区域可见的图像处理管道，例如在签名和提示语音视频中。我们的系统包括两个主要阶段:预处理和分类。在预处理阶段，我们将由AdaBoost人脸检测器获得的面部信息合并到基于肤色的分割方案中，该分割方案驱动基于卡尔曼滤波的手部跟踪，产生很少的候选窗口。在分类过程中，将提取的建议区域馈送到CNN进行手部检测和类型分类。在四个已知的手势和签名数据集上对所提出的混合方法进行了评估，结果表明其优于R-CNN及其变体的准确性和计算效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Hybrid Approach to Hand Detection and Type Classification in Upper-Body Videos

Detection of hands in videos and their classification into left and right types are crucial in various human-computer interaction and data mining systems. A variety of effective deep learning methods have been proposed for this task, such as region-based convolutional neural networks (R-CNNs), however the large number of their proposal windows per frame deem them computationally intensive. For this purpose we propose a hybrid approach that is based on substituting the “selective search” R-CNN module by an image processing pipeline assuming visibility of the facial region, as for example in signing and cued speech videos. Our system comprises two main phases: preprocessing and classification. In the preprocessing stage we incorporate facial information, obtained by an AdaBoost face detector, into a skin-tone based segmentation scheme that drives Kalman filtering based hand tracking, generating very few candidate windows. During classification, the extracted proposal regions are fed to a CNN for hand detection and type classification. Evaluation of the proposed hybrid approach on four well-known datasets of gestures and signing demonstrates its superior accuracy and computational efficiency over the R-CNN and its variants.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 7th European Workshop on Visual Information Processing (EUVIP)

自引率

0.00%

发文量