American Sign Language Fingerspelling Recognition in the Wild

2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-10-26 DOI:10.1109/SLT.2018.8639639

Bowen Shi, Aurora Martinez Del Rio, J. Keane, Jonathan Michaux, D. Brentari, Gregory Shakhnarovich, Karen Livescu

{"title":"American Sign Language Fingerspelling Recognition in the Wild","authors":"Bowen Shi, Aurora Martinez Del Rio, J. Keane, Jonathan Michaux, D. Brentari, Gregory Shakhnarovich, Karen Livescu","doi":"10.1109/SLT.2018.8639639","DOIUrl":null,"url":null,"abstract":"We address the problem of American Sign Language fingerspelling recognition “in the wild”, using videos collected from websites. We introduce the largest data set available so far for the problem of fingerspelling recognition, and the first using naturally occurring video data. Using this data set, we present the first attempt to recognize fingerspelling sequences in this challenging setting. Unlike prior work, our video data is extremely challenging due to low frame rates and visual variability. To tackle the visual challenges, we train a special-purpose signing hand detector using a small subset of our data. Given the hand detector output, a sequence model decodes the hypothesized fingerspelled letter sequence. For the sequence model, we explore attention-based recurrent encoder-decoders and CTC-based approaches. As the first attempt at fingerspelling recognition in the wild, this work is intended to serve as a baseline for future work on sign language recognition in realistic conditions. We find that, as expected, letter error rates are much higher than in previous work on more controlled data, and we analyze the sources of error and effects of model variants.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"47","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2018.8639639","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 47

Abstract

We address the problem of American Sign Language fingerspelling recognition “in the wild”, using videos collected from websites. We introduce the largest data set available so far for the problem of fingerspelling recognition, and the first using naturally occurring video data. Using this data set, we present the first attempt to recognize fingerspelling sequences in this challenging setting. Unlike prior work, our video data is extremely challenging due to low frame rates and visual variability. To tackle the visual challenges, we train a special-purpose signing hand detector using a small subset of our data. Given the hand detector output, a sequence model decodes the hypothesized fingerspelled letter sequence. For the sequence model, we explore attention-based recurrent encoder-decoders and CTC-based approaches. As the first attempt at fingerspelling recognition in the wild, this work is intended to serve as a baseline for future work on sign language recognition in realistic conditions. We find that, as expected, letter error rates are much higher than in previous work on more controlled data, and we analyze the sources of error and effects of model variants.

查看原文本刊更多论文

野外美国手语手指拼写识别

我们使用从网站上收集的视频，解决了“在野外”识别美国手语手指拼写的问题。我们介绍了迄今为止可用于指纹识别问题的最大数据集，也是第一个使用自然发生的视频数据的数据集。使用该数据集，我们首次尝试在这种具有挑战性的设置中识别指纹拼写序列。与之前的工作不同，由于低帧率和视觉可变性，我们的视频数据极具挑战性。为了解决视觉上的挑战，我们使用一小部分数据训练了一个特殊用途的手语检测器。给定手检测器输出，序列模型解码假设的手指拼写字母序列。对于序列模型，我们探索了基于注意的循环编码器-解码器和基于ctc的方法。作为野外手势语识别的首次尝试，本工作旨在为未来在现实条件下的手语识别工作奠定基础。我们发现，正如预期的那样，字母错误率比以前在更多受控数据上的工作要高得多，我们分析了错误的来源和模型变量的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量