TieLent

Proceedings of the International Conference on Advanced Visual Interfaces Pub Date : 2020-09-28 DOI:10.1145/3399715.3399852

N. Kimura, Kentaro Hayashi, J. Rekimoto

{"title":"TieLent","authors":"N. Kimura, Kentaro Hayashi, J. Rekimoto","doi":"10.1145/3399715.3399852","DOIUrl":null,"url":null,"abstract":"With the increased use of smart speakers, silent speech interaction (SSI) is attracting attention. Unfortunately, traditional silent speech interaction methods require the addition of obtrusive sensors and devices around the user's face, making wearability and portability a challenge. Considering that most uses for smart speakers do not require many words, we suggest a more casual approach, TieLent, which can easily be worn between the neck and the chest. TieLent's RGB camera is set away from the user's face, presenting less interference with the user. Although TieLent's camera is not able to capture the whole mouth, when combined with our image-to-speech neural network model, it is able to generate the recognizable speech of 15 commands with an average accuracy of 94%.","PeriodicalId":149902,"journal":{"name":"Proceedings of the International Conference on Advanced Visual Interfaces","volume":"186 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"TieLent\",\"authors\":\"N. Kimura, Kentaro Hayashi, J. Rekimoto\",\"doi\":\"10.1145/3399715.3399852\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the increased use of smart speakers, silent speech interaction (SSI) is attracting attention. Unfortunately, traditional silent speech interaction methods require the addition of obtrusive sensors and devices around the user's face, making wearability and portability a challenge. Considering that most uses for smart speakers do not require many words, we suggest a more casual approach, TieLent, which can easily be worn between the neck and the chest. TieLent's RGB camera is set away from the user's face, presenting less interference with the user. Although TieLent's camera is not able to capture the whole mouth, when combined with our image-to-speech neural network model, it is able to generate the recognizable speech of 15 commands with an average accuracy of 94%.\",\"PeriodicalId\":149902,\"journal\":{\"name\":\"Proceedings of the International Conference on Advanced Visual Interfaces\",\"volume\":\"186 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on Advanced Visual Interfaces\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3399715.3399852\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Advanced Visual Interfaces","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3399715.3399852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

TieLent

With the increased use of smart speakers, silent speech interaction (SSI) is attracting attention. Unfortunately, traditional silent speech interaction methods require the addition of obtrusive sensors and devices around the user's face, making wearability and portability a challenge. Considering that most uses for smart speakers do not require many words, we suggest a more casual approach, TieLent, which can easily be worn between the neck and the chest. TieLent's RGB camera is set away from the user's face, presenting less interference with the user. Although TieLent's camera is not able to capture the whole mouth, when combined with our image-to-speech neural network model, it is able to generate the recognizable speech of 15 commands with an average accuracy of 94%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the International Conference on Advanced Visual Interfaces

自引率

0.00%

发文量