MuteIt: Jaw Motion Based Unvoiced Command Recognition Using Earable

Tanmay Srivastava, Prerna Khanna, Shijia Pan, Phuc Nguyen, S. Jain
{"title":"MuteIt: Jaw Motion Based Unvoiced Command Recognition Using Earable","authors":"Tanmay Srivastava, Prerna Khanna, Shijia Pan, Phuc Nguyen, S. Jain","doi":"10.1145/3550281","DOIUrl":null,"url":null,"abstract":"In this paper, we present MuteIt, an ear-worn system for recognizing unvoiced human commands. MuteIt presents an intuitive alternative to voice-based interactions that can be unreliable in noisy environments, disruptive to those around us, and compromise our privacy. We propose a twin-IMU set up to track the user's jaw motion and cancel motion artifacts caused by head and body movements. MuteIt processes jaw motion during word articulation to break each word signal into its constituent syllables, and further each syllable into phonemes (vowels, visemes, and plosives). Recognizing unvoiced commands by only tracking jaw motion is challenging. As a secondary articulator, jaw motion is not distinctive enough for unvoiced speech recognition. MuteIt combines IMU data with the anatomy of jaw movement as well as principles from linguistics, to model the task of word recognition as an estimation problem. Rather than employing machine learning to train a word classifier, we reconstruct each word as a sequence of phonemes using a bi-directional particle filter, enabling the system to be easily scaled to a large set of words. We validate MuteIt for 20 subjects with diverse speech accents to recognize 100 common command words. MuteIt achieves a mean word recognition accuracy of 94.8% in noise-free conditions. When compared with common voice assistants, MuteIt outperforms them in noisy acoustic environments, achieving higher than 90% recognition accuracy. Even in the presence of motion artifacts, such as head movement, walking, and riding in a moving vehicle, MuteIt achieves mean word recognition accuracy of 91% over all scenarios.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"51 1","pages":"1 - 26"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3550281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

In this paper, we present MuteIt, an ear-worn system for recognizing unvoiced human commands. MuteIt presents an intuitive alternative to voice-based interactions that can be unreliable in noisy environments, disruptive to those around us, and compromise our privacy. We propose a twin-IMU set up to track the user's jaw motion and cancel motion artifacts caused by head and body movements. MuteIt processes jaw motion during word articulation to break each word signal into its constituent syllables, and further each syllable into phonemes (vowels, visemes, and plosives). Recognizing unvoiced commands by only tracking jaw motion is challenging. As a secondary articulator, jaw motion is not distinctive enough for unvoiced speech recognition. MuteIt combines IMU data with the anatomy of jaw movement as well as principles from linguistics, to model the task of word recognition as an estimation problem. Rather than employing machine learning to train a word classifier, we reconstruct each word as a sequence of phonemes using a bi-directional particle filter, enabling the system to be easily scaled to a large set of words. We validate MuteIt for 20 subjects with diverse speech accents to recognize 100 common command words. MuteIt achieves a mean word recognition accuracy of 94.8% in noise-free conditions. When compared with common voice assistants, MuteIt outperforms them in noisy acoustic environments, achieving higher than 90% recognition accuracy. Even in the presence of motion artifacts, such as head movement, walking, and riding in a moving vehicle, MuteIt achieves mean word recognition accuracy of 91% over all scenarios.
MuteIt:使用Earable基于下颌运动的静音命令识别
在本文中,我们提出了MuteIt,一个耳戴式系统,用于识别无声的人类命令。MuteIt为基于语音的交互提供了一种直观的替代方案,语音交互在嘈杂的环境中可能不可靠,对我们周围的人造成干扰,并损害我们的隐私。我们提出了一种双imu装置来跟踪用户的下巴运动,并消除由头部和身体运动引起的运动伪影。它在发音时处理下颚的运动,将每个单词信号分解成它的组成音节,并进一步将每个音节分解成音素(元音、音素和爆破音)。仅通过跟踪下巴运动来识别未发音的命令是具有挑战性的。作为一个次要的发音器,下颌的运动在无声语音识别中不够明显。它将IMU数据与颌骨运动解剖以及语言学原理相结合,将单词识别任务建模为估计问题。我们没有使用机器学习来训练单词分类器,而是使用双向粒子过滤器将每个单词重构为音素序列,从而使系统能够轻松扩展到大量单词。我们为20个具有不同语音口音的主题验证MuteIt,以识别100个常见命令词。它达到了94的平均单词识别精度。8%在无噪声条件下。与普通语音助手相比,MuteIt在嘈杂的声音环境中表现出色,识别准确率达到90%以上。即使在存在运动伪影的情况下,例如头部运动、行走和骑在移动的车辆中,MuteIt在所有场景中也能达到91%的平均单词识别准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信