Weakly-supervised Fingerspelling Recognition in British Sign Language Videos

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-11-16 DOI:10.48550/arXiv.2211.08954

Prajwal K R, Hannah Bull, Liliane Momeni, Samuel Albanie, Gül Varol, Andrew Zisserman

{"title":"Weakly-supervised Fingerspelling Recognition in British Sign Language Videos","authors":"Prajwal K R, Hannah Bull, Liliane Momeni, Samuel Albanie, Gül Varol, Andrew Zisserman","doi":"10.48550/arXiv.2211.08954","DOIUrl":null,"url":null,"abstract":"The goal of this work is to detect and recognize sequences of letters signed using fingerspelling in British Sign Language (BSL). Previous fingerspelling recognition methods have not focused on BSL, which has a very different signing alphabet (e.g., two-handed instead of one-handed) to American Sign Language (ASL). They also use manual annotations for training. In contrast to previous methods, our method only uses weak annotations from subtitles for training. We localize potential instances of fingerspelling using a simple feature similarity method, then automatically annotate these instances by querying subtitle words and searching for corresponding mouthing cues from the signer. We propose a Transformer architecture adapted to this task, with a multiple-hypothesis CTC loss function to learn from alternative annotation possibilities. We employ a multi-stage training approach, where we make use of an initial version of our trained model to extend and enhance our training data before re-training again to achieve better performance. Through extensive evaluations, we verify our method for automatic annotation and our model architecture. Moreover, we provide a human expert annotated test set of 5K video clips for evaluating BSL fingerspelling recognition methods to support sign language research.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"43 1-2 1","pages":"609"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2211.08954","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

The goal of this work is to detect and recognize sequences of letters signed using fingerspelling in British Sign Language (BSL). Previous fingerspelling recognition methods have not focused on BSL, which has a very different signing alphabet (e.g., two-handed instead of one-handed) to American Sign Language (ASL). They also use manual annotations for training. In contrast to previous methods, our method only uses weak annotations from subtitles for training. We localize potential instances of fingerspelling using a simple feature similarity method, then automatically annotate these instances by querying subtitle words and searching for corresponding mouthing cues from the signer. We propose a Transformer architecture adapted to this task, with a multiple-hypothesis CTC loss function to learn from alternative annotation possibilities. We employ a multi-stage training approach, where we make use of an initial version of our trained model to extend and enhance our training data before re-training again to achieve better performance. Through extensive evaluations, we verify our method for automatic annotation and our model architecture. Moreover, we provide a human expert annotated test set of 5K video clips for evaluating BSL fingerspelling recognition methods to support sign language research.

查看原文本刊更多论文

英国手语视频中的弱监督拼写识别

这项工作的目标是检测和识别在英国手语(BSL)中使用手指拼写的字母序列。以前的手指拼写识别方法并没有关注BSL，它与美国手语(ASL)有着非常不同的手语字母表(例如，用双手而不是单手)。他们还使用手动注释进行训练。与之前的方法相比，我们的方法只使用来自字幕的弱注释进行训练。我们使用一种简单的特征相似度方法来定位潜在的手指拼写实例，然后通过查询字幕词和从签名者那里搜索相应的口型线索来自动注释这些实例。我们提出了一个适合于此任务的Transformer架构，其中包含一个多假设CTC损失函数，以从其他注释可能性中学习。我们采用多阶段训练方法，在再次训练之前，我们利用训练模型的初始版本扩展和增强我们的训练数据，以获得更好的表现。通过广泛的评估，我们验证了自动注释的方法和模型体系结构。此外，我们还提供了一个5K视频片段的人类专家注释测试集，用于评估BSL指纹拼写识别方法，为手语研究提供支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference

自引率

0.00%

发文量