{"title":"TieLent","authors":"N. Kimura, Kentaro Hayashi, J. Rekimoto","doi":"10.1145/3399715.3399852","DOIUrl":null,"url":null,"abstract":"With the increased use of smart speakers, silent speech interaction (SSI) is attracting attention. Unfortunately, traditional silent speech interaction methods require the addition of obtrusive sensors and devices around the user's face, making wearability and portability a challenge. Considering that most uses for smart speakers do not require many words, we suggest a more casual approach, TieLent, which can easily be worn between the neck and the chest. TieLent's RGB camera is set away from the user's face, presenting less interference with the user. Although TieLent's camera is not able to capture the whole mouth, when combined with our image-to-speech neural network model, it is able to generate the recognizable speech of 15 commands with an average accuracy of 94%.","PeriodicalId":149902,"journal":{"name":"Proceedings of the International Conference on Advanced Visual Interfaces","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Advanced Visual Interfaces","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3399715.3399852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
With the increased use of smart speakers, silent speech interaction (SSI) is attracting attention. Unfortunately, traditional silent speech interaction methods require the addition of obtrusive sensors and devices around the user's face, making wearability and portability a challenge. Considering that most uses for smart speakers do not require many words, we suggest a more casual approach, TieLent, which can easily be worn between the neck and the chest. TieLent's RGB camera is set away from the user's face, presenting less interference with the user. Although TieLent's camera is not able to capture the whole mouth, when combined with our image-to-speech neural network model, it is able to generate the recognizable speech of 15 commands with an average accuracy of 94%.