An alignment based similarity measure for hand detection in cluttered sign language video

2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops Pub Date : 2009-06-20 DOI:10.1109/CVPRW.2009.5204266

Ashwin Thangali, S. Sclaroff

{"title":"An alignment based similarity measure for hand detection in cluttered sign language video","authors":"Ashwin Thangali, S. Sclaroff","doi":"10.1109/CVPRW.2009.5204266","DOIUrl":null,"url":null,"abstract":"Locating hands in sign language video is challenging due to a number of factors. Hand appearance varies widely across signers due to anthropometric variations and varying levels of signer proficiency. Video can be captured under varying illumination, camera resolutions, and levels of scene clutter, e.g., high-res video captured in a studio vs. low-res video gathered by a Web cam in a user's home. Moreover, the signers' clothing varies, e.g., skin-toned clothing vs. contrasting clothing, short-sleeved vs. long-sleeved shirts, etc. In this work, the hand detection problem is addressed in an appearance matching framework. The histogram of oriented gradient (HOG) based matching score function is reformulated to allow non-rigid alignment between pairs of images to account for hand shape variation. The resulting alignment score is used within a support vector machine hand/not-hand classifier for hand detection. The new matching score function yields improved performance (in ROC area and hand detection rate) over the vocabulary guided pyramid match kernel (VGPMK) and the traditional, rigid HOG distance on American Sign Language video gestured by expert signers. The proposed match score function is computationally less expensive (for training and testing), has fewer parameters and is less sensitive to parameter settings than VGPMK. The proposed detector works well on test sequences from an inexpert signer in a non-studio setting with cluttered background.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPRW.2009.5204266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Locating hands in sign language video is challenging due to a number of factors. Hand appearance varies widely across signers due to anthropometric variations and varying levels of signer proficiency. Video can be captured under varying illumination, camera resolutions, and levels of scene clutter, e.g., high-res video captured in a studio vs. low-res video gathered by a Web cam in a user's home. Moreover, the signers' clothing varies, e.g., skin-toned clothing vs. contrasting clothing, short-sleeved vs. long-sleeved shirts, etc. In this work, the hand detection problem is addressed in an appearance matching framework. The histogram of oriented gradient (HOG) based matching score function is reformulated to allow non-rigid alignment between pairs of images to account for hand shape variation. The resulting alignment score is used within a support vector machine hand/not-hand classifier for hand detection. The new matching score function yields improved performance (in ROC area and hand detection rate) over the vocabulary guided pyramid match kernel (VGPMK) and the traditional, rigid HOG distance on American Sign Language video gestured by expert signers. The proposed match score function is computationally less expensive (for training and testing), has fewer parameters and is less sensitive to parameter settings than VGPMK. The proposed detector works well on test sequences from an inexpert signer in a non-studio setting with cluttered background.

查看原文本刊更多论文

一种基于对齐的相似度方法用于杂乱手语视频中的手部检测

由于许多因素，在手语视频中定位手势是具有挑战性的。由于人体测量差异和不同水平的熟练程度，不同的签名者的手外观差异很大。视频可以在不同的照明、相机分辨率和场景杂乱程度下拍摄，例如，在工作室拍摄的高分辨率视频与在用户家中的网络摄像头收集的低分辨率视频。此外，签名者的服装也各不相同，如肤色服装与对比色服装，短袖衬衫与长袖衬衫等。在这项工作中，在外观匹配框架中解决了手部检测问题。直方图定向梯度(HOG)为基础的匹配分数函数重新制定，以允许对图像之间的非刚性对齐，以说明手的形状变化。得到的对齐分数用于支持向量机手/无手分类器中进行手检测。在美国手语视频上，新的匹配分数函数比词汇引导金字塔匹配核(VGPMK)和传统的、严格的HOG距离得到了更好的性能(ROC面积和手部检测率)。与VGPMK相比，所提出的匹配分数函数计算成本更低(用于训练和测试)，参数更少，对参数设置的敏感度更低。所提出的检测器可以很好地检测来自非专业签名者的测试序列，并且在非工作室设置中具有杂乱的背景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

自引率

0.00%

发文量