Using Data-Driven Approach for Modeling Timing Parameters of American Sign Language

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI:10.1145/3242969.3264965

Sedeeq Al-khazraji

{"title":"Using Data-Driven Approach for Modeling Timing Parameters of American Sign Language","authors":"Sedeeq Al-khazraji","doi":"10.1145/3242969.3264965","DOIUrl":null,"url":null,"abstract":"While many organizations provide a website in multiple languages, few provide a sign-language version for deaf users, many of whom have lower written-language literacy. Rather than providing difficult-to-update videos of humans, a more practical solution would be for the organization to specify a script (representing the sequence of words) to generate a sign-language animation. The challenge is we must select the accurate speed and timing of signs. In this work, focused on American Sign Language (ASL), motion-capture data recorded from humans is used to train machine learning models to calculate realistic timing for ASL animation movement, with an initial focus on inserting prosodic breaks (pauses), adjusting the pause durations for these pauses, and adjusting differentials signing rate for ASL animations based on the sentence syntax and other features. The methodology includes processing and cleaning data from an ASL corpus with motion-capture recordings, selecting features, and building machine learning models to predict where to insert pauses, length of pauses, and signing speed. The resulting models were evaluated using a cross-validation approach to train and test multiple models on various partitions of the dataset, to compare various learning algorithms and subsets of features. In addition, a user-based evaluation was conducted in which native ASL signers evaluated animations generated based on these models. This paper summarizes the motivations for this work, proposed solution, and the potential contribution of this work. This paper describes both completed work and some additional future research plans.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3242969.3264965","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

While many organizations provide a website in multiple languages, few provide a sign-language version for deaf users, many of whom have lower written-language literacy. Rather than providing difficult-to-update videos of humans, a more practical solution would be for the organization to specify a script (representing the sequence of words) to generate a sign-language animation. The challenge is we must select the accurate speed and timing of signs. In this work, focused on American Sign Language (ASL), motion-capture data recorded from humans is used to train machine learning models to calculate realistic timing for ASL animation movement, with an initial focus on inserting prosodic breaks (pauses), adjusting the pause durations for these pauses, and adjusting differentials signing rate for ASL animations based on the sentence syntax and other features. The methodology includes processing and cleaning data from an ASL corpus with motion-capture recordings, selecting features, and building machine learning models to predict where to insert pauses, length of pauses, and signing speed. The resulting models were evaluated using a cross-validation approach to train and test multiple models on various partitions of the dataset, to compare various learning algorithms and subsets of features. In addition, a user-based evaluation was conducted in which native ASL signers evaluated animations generated based on these models. This paper summarizes the motivations for this work, proposed solution, and the potential contribution of this work. This paper describes both completed work and some additional future research plans.

查看原文本刊更多论文

基于数据驱动的美国手语时序参数建模方法

虽然许多组织提供多种语言的网站，但很少有组织为聋人用户提供手语版本，其中许多人的书面语言读写能力较低。与其提供难以更新的人类视频，更实际的解决方案是组织指定一个脚本(表示单词序列)来生成手语动画。挑战在于我们必须选择准确的速度和时间标志。本研究以美国手语(ASL)为研究对象，使用人类记录的动作捕捉数据来训练机器学习模型，以计算ASL动画运动的真实时间，最初的重点是插入韵律中断(暂停)，调整这些暂停的暂停时间，以及根据句子语法和其他特征调整ASL动画的差异签名率。该方法包括处理和清理带有动作捕捉记录的ASL语料库中的数据，选择特征，以及构建机器学习模型来预测插入停顿的位置、停顿的长度和签名速度。使用交叉验证方法对生成的模型进行评估，以在数据集的不同分区上训练和测试多个模型，以比较各种学习算法和特征子集。此外，还进行了一项基于用户的评估，在该评估中，本地ASL手语使用者评估基于这些模型生成的动画。本文总结了这项工作的动机，提出的解决方案，以及这项工作的潜在贡献。本文描述了已完成的工作和一些额外的未来研究计划。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 20th ACM International Conference on Multimodal Interaction

自引率

0.00%

发文量